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On (54) Title: NUCLEIC ACID SEQUENCE ENCODING OVARIAN ANTIGEN, CA125, AND USES THEREOF 

^ (57) Abstract: The present invention provides an isolated nucleic acid molecule comprising sequences encoding the CA125 protein 
or a portion thereof. This invention also provides a method to detect ovarian cancer in a subject. Furthermore, this invention provides 
Q a method for the diagnosis of a cancer which expresses CA125 by detecting CA125-expressing cells in the blood or other fluids of 
^ patients. This invention also provides a method of producing CA125 protein. Finally, this invention provides a method to treat or 
prevent cancer using a vaccine comprising CA125 nucleic acid or protein. 
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NUCLEIC ACID SEQUENCE ENCODING 
OVARIAN ANTIGEN, CA125^ AND USES THEREOF 

5 This application * claims benefit of U.S. Patent Application 
No. 60/290,480, Filed on 11 May 2001, the content of which 
is incorporated here into this application. 

The invention disclosed herein was made with government 
10 support under NIH Grants No. CA52477 and CA08748, from the 
United States Department of Health and Human Services. 
Accordingly, the U.S. Government has certain rights in this 
invention. 

15 Throughout this application, various references are referred 
to. Disclosures of these publications in their entireties 
are hereby incorporated by reference into this application 
to more fully describe the state of the art to which this 
invention pertains. 

20 

BACKGROX7ND OP THE INVENTION 

CA125 antigen is a serum marker that is used routinely in 
gynecologic practice to monitor patients with ovarian 

25 cancer- It is a mullerian duct differentiation antigen that 
is overexpressed in epithelial ovarian cancer cells and 
secreted into the blood, although its expression is not 
entirely confined to ovarian cancer. CA125 was first 
identified by Bast and Knapp (1) in 1981 by a monoclonal 

30 antibody (0C125) that had been developed from mice immunized 
with an ovarian cancer cell line. These investigators 
subsequently developed a radio- immunoassay for the antigen 
and showed that serum CA125 levels are elevated in about 80% 
of patients with epithelial ovarian cancer (EOC) ^ but in less 

35 than 1% of healthy women (2) . Numerous studies since that 
time have confirmed the usefulness of CA125 levels in 
monitoring the progress of patients with EOC (3-6) . Most 
reports indicate that a rise in CA125 levels precedes 
clinical detection by about 3 months. During chemotherapy, 

40 changes in serum CA125 levels correlate with the course of 
the disease. CA125 is being used in the inventors' Medical 
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Center, and elsewhere; as a surrogate marker for clinical 
response in phase II trials of new drugs. On the other 
hand, CA125 is not useful in the initial diagnosis of EOC 
because of its elevation in a number of benign conditions 
5 (3/ 7) . Despite this limitation, CA125 is considered to be 
one of the best available cancer serum markers, however more 
information on its molecular nature is needed to fully 
explore its potential, 

10 Although CA125 antigen was first detected over 20 years ago, 
very little is known about its biochemistry and genetics. 
Most biochemical studies have concluded that CA125 is a high 
molecular weight glycoprotein, although estimates of its 
size range from 200 to 2000 kDa with smaller ""subunit's" 

15 being described by some investigators (8-13) . Most studies 
have shown that CA125 is a mucin- type molecule, but others 
have claimed that it is a typical glycoprotein with 
asparagine- linked sugar chains (14) . Another study claimed 
that CA125 is a glycosyl -phosphoinos it ol -linked glycoprotein 

20 (11) . Thus, no consensus emerged from these studies 
concerning the biochemical nature of this antigen: 
Recently, however, our studies have strongly indicated that 
CA12 5 is a typical mucin molecule with a high carbohydrate 
content and a preponderance of serine and threonine -linked 

25 (0-1 inked) glycan chains (15, 16) . Possibly because of the 
mucinous nature of CA125 its peptide moiety has been very 
difficult to clone. The only published study on this topic 
(17) described the isolation of a novel cDNA, later termed 
NBR-l (18) , but this species does not seem to have any of 

30 the biochemical characteristics expected for CA125 and may, 
in fact, be a transcription factor. Using a rabbit 
antiserum to purified CA125 we have now cloned, by 
es^ression cloning, a long partial cDNA sequence 
corresponding to a new mucin species (designated 

35 CA125/MUC16) that is a strong candidate for being the 
peptide core of the CA125 antigen. 
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SUMMARY OF THE INVENTION 

The invention disclosed herein provides an isolated nucleic 
acid molecule cortprising sequences encoding the CA125 
5 protein or a portion thereof. This invention also provides 
the gene encoding the CA125 protein. 

In addition, this invention provides a vaccine for cancer 
which expresses CA125 protein comprising an appropriate 
amount of the isolated nucleic acid molecules which, when 
expressed, are capable of producing a product which induces 
an immune response to CA125 protein. This invention also 
provides a vaccine for cancer which expresses CA125 protein 
comprising an appropriate amount of a substance which 
induces an immune response to CA125 protein. This invention 
also provides a method for the diagnosis of a cancer which 
expresses CA125 by detecting CA12 5 -expressing cells in the 
blood or other fluids of patients based on the nucleic acid 
sequence which encodes CA125. Furthermore, this invention 
provides a method for monitoring the therapy of a cancer 
which expresses CA125 by measuring the expression of CA125- 
expressing cells in the blood or other fluids of patient's 
based on the nucleic acid sequence which encodes CA125, a 
decrease of either the nximber of CA125-expressing cells or 
level of protein expression in the cell, indicating the 
success of the therapy. 

In addition, this invention provides ' a method of producing 
CA125 protein comprising steps of: a) constructing a vector 
adapted for expression in a cell which comprises the 
regulatory elements necessary for expression of nucleic acid 
in the cell operatively linked to the nucleic acid encoding 
the CA125 protein so as to pearmit expression thereof; b) 
placing the cells of step (a) under conditions allowing the 
expression of the CA125 protein; and c) recovering the CA125 
protein so esqjressed. 
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Finally, this invention provides a nonhuman organi 
wherein the expression of CA125 is inhibited. 
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DETAILED DESCRIPTION OF THE FIGURES 
First Series Of Experiments 

Fig. 1. SDS-PAGE analysis of purified CA125 sample. The 
5 gel (3% stacking gel and 5% separating gel) was run under 
reducing conditions and stained with silver reagent. The 
arrowhead indicates the interface between the stacking and 
separating gels. The migration positions of molecular 
weight markers (in kDa) are shown on the right hand side. 
10 The bracket indicates the region of the gel used to immunize 
a rabbit to produce the polyclonal anti-CA125 serum. 

Fig. 2. Nucleotide sequence at 3'. end of the B4 clone of 
CA125/MUC16 , The nucleotide and amino acid sequence for B4 
15 (CA125/MUC16) have been deposited in the GenBank™ under 
accession number AF361486. Abbreviations: EOC: epithelial 
ovarian cancer; mAb: monoclonal antibody;^ TR: tandem repeat; 
PBS: phosphate buffered saline. * indicates a stop codon. A 
polyadenylation signal sequence is underlined. 

20 

Fig. 3. Deduced amino acid sequence of CA125/MUC16^ {B4) 
organized to indicate the regions of homology in the tandem 
repeats. Clustered serine and threonine residues are 
highlighted in white/ shade and conserved cysteine residues 

25 in bold/shade. Potential N-linked glycosylation sites (Asn) 
are indicated in bold type. The possible transmembrane 
region is imderlined and the consensus tyrosine 
phosphorylation motif is indicated in regular/ shade . * 
indicates residues that are perfectly conserved, except in 

30 the last repeat sequence. - indicates gaps introduced to 
preserve the best homology in the repeats. 

Fig. 4 . Northern blot analysis of expression of 
CA125/MUC16 in cancer cell lines. The blot was probed with 
35 a biot in- labeled probe (B53) from the tandem repeat region. 
1: SW626 (ovarian cancer); 2: 2774 (ovarian cancer); 3: 
SK-OV-3 (ovarian cancer); 4: SK-OV-8 (ovarian cancer); 5: 
OVCAR-3 (ovarian cancer); 6: COL0316 (ovarism cancer); 7: 
MCF-7 (breast cancer); 8: IMR-3 (neuroblastoma); 9: MKN45 
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(gastric cancer); 10: MCA (sarcoma). Indicated on the top 
of the figure (+ or -) is the expression of CA125 in the 
cell line as determined by reactivity with anti-CA125 
antibodies. The end-point titers for these cell lines with 
5 ttiAb 0C125 were 1- <1:500; 2- <1:500; 3- <1:500; 4- 1 : 
128,000; 5- >1 : 256,000; 6- 1:4000; 7- <1:500; 8- <1:500; 
9- <1:500; 10- <1:500. Screening with mAb VK-8 gave similar 
results. The result of probing the blot with a P-actin 
probe is shown in the lower half of the figure. size 
standards are indicated on the left side of the gel. 

Pig. 5. Deduced amino acid sequence of B4 polynucleotide 
(CA125) . 

Fig. 6. Nucleotide sequence of B4 polynucleotide (CA125) . 

Fig. 7. Nucleotide sequence of B3 0 polynucleotide coding 
for a different portion of the CA125 gene. 

Fig. 8. Deduced amino acid sequence of B30 polynucleotide 
corresponding to a different portion of the CA125 gene. 

Pig. 9. Expression analysis of CA125 nucleotide clone. 
This figure is the result of an expression experiment that 
confirms that the sequence actually codes for CA125, as 
recognized by standard antibodies. 

Second Series Of Experiments 

Fig. 10. Schematic showing the protein and nucleotide sequence, 
of the 3' end of clone B30. Also shown is the region identical 
to the 5' region of clone B4. The end of repeat H and the non- 
translated region are shown in detail. The stop codon in the 
nucleotide sequence is indicated in bold type. Note that 
repeats A-H correspond to repeats 7-14 in Fig. li. 

Pig. 11. Nucleotide sequence of MUC16B. 
Fig. 12. Amino acid sequence of MUC16B. 

Fig. 13. Schematic showing relationship of NCBI gene 
sequence NT 025133.6 to clone B30 and various expressed 
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sequence tags and the 'use of this information in determining 
the sequence of MUC16B. Exons are shovm as filled boxes and 
the orientation of the reading frames (+ or -) are indicated 
for each exon. 
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DETAILED DESCRIPTION OF THE INVENTION 

The invention disclosed herein provides an isolated nucleic 
acid molecule comprising sequences encoding the CA125 
5 protein or a portion thereof. This invention also provides 
the gene encoding the CA125 protein. This invention further 
comprises the S' untranslated sequence of the CA125 gene. In 
addition, this invention comprises the 3' untranslated 
sequence of the CA125 gene. 

10 

In addition, this invention provides the above isolated 
nucleic acid molecule comprising sequence set forth in 
Figure 6, or a portion thereof, and the corresponding CA125 
protein comprising sequence set forth in Figure 5, or a 
15 portion thereof. Furthermore, this invention provides the 
above isolated nucleic acid molecule comprising sequence set 
forth in Figure 7, or a portion thereof, and the 
corresponding CA125 protein sequence set forth in Figure 8, 
or a portion thereof. In an embodiment, the nucleic acid 
comprises sequence set forth in Figure 11, or a portion 
thereof. In another embodiment, the nucleic acid encoding 
protein comprises at least a portion of the amino acid 
sequence set forth in Figure 12, or a portion thereof. 



20 



25 



30 



35 



This invention also provides the above gene comprising 
sequence set forth in Figure 10, or a portion thereof. 

The invention furthermore provides the above isolated 
nucleic acid molecules, wherein the nucleic acid is RNA, 
cDNA, genomic DNA, or synthetic DNA. This invention also 
provides a vector comprising the above nucleic acid 
molecule. In an embodiment, the vector is designated as pBK- 
CMV-B4 comprising sequence set forth in Figure 6, or a 
portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 5, or a portion 
thereof. In another embodiment, the vector is designated as 
PBKCMV-B30 coTt^rising sequence set forth in Figure 7, or a 
portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 8, or a portion 
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thereof . In yet another embodiment , the vector is designated 
as pCMV-Tag-B4 comprising sequence set forth in Figure 6, or 
a portion thereof/ and the corresponding CA125 protein 
comprising sequence set forth in Figure 5, or a portion 
5 thereof. In a further embodiment, the vector is designated 
as pCMV-Tag-B30 comprising sequence set forth in Figure 7, 
or a portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 8, or a portion 
thereof . 

10 

This invention provides an expression system comprising the 
above vector. In an embodiment, the system is a eukaryotic 
or prokaryotic system. This invention further provides a 
method for producing CA125 protein comprising the above 
15 expression system. 

This invention further provides an isolated nucleic acid 
molecule comprising sequence capable of • specifically 
hybridizing to the sequences above. In an embodiment, the 

20 nucleic acid molecule is capable of inhibiting the 
expression of the CA125 protein. A method of inhibiting 
expression of CA125 inside a cell by vector-directed 
expression of a short RNA able to hybridize with the 
protein- coding RNA of CA125. In another embodiment, the 

25 nucleic acid molecule is at least a 7mer. In another 
embodiment, it is at least a lOmer. In a separate 
embodiment, the nucleic acid molecule is at least a 20mer. 
In a further embodiment, the sequence is unique. 

30 This invention further provides a method to detect ovarian 
cancer in a subject comprising steps of: a) contacting the 
above isolated nucleic acid molecule with RNA from a sample 
from the subject under conditions permitting the formation 
of a hybrid complex, and b) detecting the hybrid complex, 

35 wherein a positive detection indicates the expression of the 
antigen and presence of cancer. 

Furthermore, this invention provides a method of monitoring 
ovarian cancer therapy in a subject comprising steps of: a) 
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contacting the above isolated nucleic acid molecule with RNA 
from a sample from the subject under conditions permitting 
the formation of a hybrid complex, and b) measuring the 
amount of the hybrid complex, wherein a decrease in the 
5 hybrid complex indicates the success of therapy. 

This invention also provides a method for inhibiting the 
expression of the CA125 protein comprising contacting an 
appropriate amount of the above nucleic acid molecule so 
10 that hybridization of the gene or transcript encoding the 
CA125 protein will occur, thereby inhibiting the esqpression 
of the protein. This invention further provides a 
composition comprising the above isolated nucleic acid 
molecule . 

15 

In addition, this invention provides a vaccine for a cancer 
which expresses CA125 protein comprising an appropriate 
amount of the above isolated nucleic acid molecules. 

20 In a separate embodiment, this invention provides a vaccine 
for a cancer which expresses CA125 protein comprising an 
appropriate amount of the isolated nucleic acid molecules 
which, when expressed, are capable of producing a product 
which induces an immune response to CA125 protein. In an 

25 embodiment, the nucleic acid molecule comprises sequences 
encoding human CA125 protein or a portion thereof. 

In another embodiment, the expressed human sequence is 
linked to a carrier. It is known that a carrier can booster 
30 immune response. The said carrier may be a protein carrier. 



In yet another embodiment, the nucleic acid molecule 
comprises a nonhuman sequence. In a further embodiment, the 
nucleic acid molecule comprises a primate sequence. In an 
35 additional embodiment, the nucleic acid molecule comprises a 
murine sequence. In a further embodiment, it comprises a rat 
or mouse sequence. In yet another embodiment, the nucleic 
acid molecule comprises a synthetic sequence, which, when 
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expressed, is capable of producing a product which induces 
an immiine response to CA125 protein. 

In addition, this invention provides the vaccine wherein the 
5 sequence hybridizes with or is homologous to the sequences 
encoding human CA125 protein, in an embodiment, the vaccine 
further comprising a suitable adjuvant. In an embodiment, 
the adjuvant is an alum. In another embodiment, the cancer 
is an ovarian, pancreatic, breast, endometrial , or lung 
10 carcinoma. 

This invention also provides a method to treat a cancer 
which expresses CA125 in a subject comprising administering 
to the sxibject an appropriate amount of the above vaccine. 

15 

This invention also provides the above method, wherein the 
cancer is an ovarian, pancreatic, breast, endometrial, or 
lung carcinoma. 

20 This invention further provides a vaccine for a cancer which 
expresses CA125 comprising an appropriate amoiint of the'-^ 
expressed CA125 protein corresponding to the above^ sequence . , 

This invention also provides a vaccine for a cancer which 
•25 expresses CA125 protein comprising an appropriate amount of 
a substance which induces an immune response to CA125 
protein. In an embodiment, the substance is a polypeptide or 
a peptide. In a separate embodiment, the polypeptide 
comprises sequences . encoding human CA125 protein or a 
30 portion thereof. In yet another embodiment, the expressed 
human sequence is linked to a carrier. In a further 
embodiment, the polypeptide comprises a nonhuman sequence. 
In a separate embodiment, the polypeptide comprises a 
primate sequence. In another embodiment, the polypeptide 
35 comprises a murine sequence. In yet another embodiment, the 
polypeptide comprises a synthetic sequence, which, when 
expressed, is capable of producing a product which induces 
an immune response to CA125 protein. The production of a 
synthetic sequence or a hybrid of synthetic and natural 



BNSDCXID: <WO ^02092836A2_I_> 



wo 02/092836 PCT/US02/14768 

11 

sequences is well-known in this field. In separate 
embodiment/ the vaccine further comprising a suitable 
adjuvant. In an embodiment, the adjuvant is an alum. 

5 This invention provides the above vaccine, wherein the 
expressed protein is conjugated to a protein carrier to 
increase the immunogenicity . Furthermore, this invention 
provides the above vaccine, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung carcinoma. 

10 

Furthermore, this invention provides a method to treat a 
cancer which esqpresses CA125 in a subject comprising 
administering to the subject an appropriate amount of the 
above vaccine. 

15 

This invention also provides a method to prevent a cancer 
which expresses CA125 in a subject comprising administering 
to the subject an appropriate amount of the above vaccine. 
In an embodiment, the cancer is an ovarian, pancreatic, 
20 breast, endometrial, or lung carcinoma. 

In addition, this invention provides a method for the 
diagnosis of a cancer which expresses CA125 by detecting 
CA12 5 -expressing cells in the blood or other fluids of 
25 patients based on the nucleic acid sequence which encodes 
CA125. 

This invention also provides a method for monitoring the 
therapy of a cancer which expresses CA125 by measuring the 

30 expression of CA12 5 -expressing cells in the blood or other 
fluids of patients based on the nucleic acid sequence which 
encodes CA125/ a decrease of either the number of CA125- 
expressing cells or level of protein expression in the cell, 
indicating the success of the therapy, in an embodiment, the 

35 detection is based on polymerase chain reaction with 
appropriate primers . 

This invention further provides a method of producing CA125 
protein comprising steps of: a) constructing a vector adapted 
40 for expression in a cell which comprises the regulatory 
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elements necessary for expression of nucleic acid in the 
cell operatively linked to the nucleic acid encoding the 
CA125 protein so as to permit expression thereof; b) placing 
the cells of step (a) \ander conditions allowing the 
5 expression of the CA125 protein; and c) recovering the CA125 
protein so expressed. In an embodiment, the cell type is 
selected from the group consisting of bacterial cells, yeast 
cells, insect cells, and mammalian cells - 

10 This invention also provides the CA125 protein expressed by the 
above method. This invention also provides a method for 
production of antibodies against CA125 protein using the 
protein. This invention also provides the antibodies produced 
by the above method. This invention also provides a method of 

15 diagnosis of cancer which expresses CA125 using the antibodies 
above.' A method for monitoring the therapy of cancer which 
expresses CA125 using the above antibodies. 

This invention further provides a method for determining the 
20 immuno reactive part of CA125 comprising contacting 
antibodies which are known to be reactive to CA125 with the 
protein above. Furthermore, this invention provides a 
transgenic nonhuman organism comprising the above isolated 
nucleic acid molecule. In an embodiment, the organism is a 
25 transgenic nonhuman mammal . 

This invention also provides a nonhuman organism, wherein 
the expression of CA125 is inhibited. In an embodiment, the 
organism is a nonhuman mammal. In a separate embodiment, the 
30 mammal is a mouse. 

Finally, this invention further provides a method for 
screening a compound for treatment of cancer which expresses 
CA125 protein comprising administering the compound to the 
35 transgenic nonhuman organism above, a decrease in expression 
of CA125 protein indicating that the compound may be useful 
for treatment of the cancer. In an embodiment, the cancer is 
an ovarian, pancreatic, breast, endometrial, or lung 
carcinoma. 
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The invention will be better understood by reference to the 
Experimental Details which follow, but those skilled in the 
art will readily appreciate that the specific experiments 
5 detailed are only illustrative, and are not meant to limit 
the invention as described herein, which is defined by the 
claims which follow thereafter. 

CA125 is an ovarian cancer antigen that is basis for a 

10 widely-used serum assay for the monitoring of patients with 
ovarian cancer, however detailed information on its 
biochemical and molecular nature is lacking. The inventors 
now report the isolation of a long, but partial, cDNA that 
corresponds to the CA125 antigen. A rabbit polyclonal 

15 antibody produced to purified CA125 antigen was used to 
screen a XZAP cDNA library from OVCAR-3 cells in Escherichia 
coll. The longest insert from the 53 positive isolated 
clones had a 5965 b.p. sequence containing a stop codon and 
a poly A sequence but no clear 5' initiation sequence. The 

20 deduced amino acid sequence has many of the attributes of a 
mucin molecule and was designated CA125/MUC16. These 
features include a high serine, threonine, and proline 
content in an N- terminal region of nine partially conserved 
tandem repeats (156 amino acids each) and a C- terminal 

25 region non- tandem repeat sequence containing a possible 
transmembrane region and a potential tyrosine 
phosphorylation site. Northern blotting showed that the 

level of MUC16 mRNA correlated with the expression of CA125 
in a panel of cell lines. The molecular cloning of 

30 CA125/MUC16 antigen will lead to a better understanding of 
its role in ovarian cancer. 

EXPERXMENTAIi DETAILS 

35 First Series of Experiments 
Materials and Methods 

NIH:OVCAR3 cell line was obtained from the American Type 
Culture Collection (Rockville, MD) . Anti-CA125 antibody mAb 
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OC125 was a generous gift from Dr. R. Bast, Jr. mAb VK-8, 

developed in the inventors' Laboratory by immunization of 
mice with human ovarian cancer cell line OVCAR-3, also 

identifies CA125 but reacts with a different epitope (s) than 

5 OC125 (15) . Tumor cell lines were from the Sloan-Kettering 
Institute Cell Bank. 



Purification of CA125 Antigen 

CA125 was purified from the culture supernatant of 

10 NIH:OVCAR-3 cells in a simple two-step procedure (15) . 
Briefly, the cells were cultured as a monolayer in a 
synthetic medium (ITS, Life Technologies, Grand Island, NY) 
in RPMI medium containing 1% fetal bovine serum (FBS) and 
the culture medium was harvested every 7 days. Medium from 

15 31 liters of supernatant medium was concentrated 10 fold and 
precipitated with perchloric acid (0.6 M final 
concentration) . After centrif uging, the neutralized 

supernatant was passed through a column of normal mouse Ig- 
agarose (3 0 ml; 1.0 mg/ral) and then through a column of VK-8 

20 mAb (80 ml; 2.0 mg/ml) . The antibodies were linked to 
Actigel ALD gel according to the manufacturer's directions 
(Sterogene Bioseparations , Inc., Carlsbad, CA) . The VK-8 
column was washed at 4° with PBS, then with IM NaCl in PBS, 
and finally eluted with 3M MgCl2 . Fractions (6.0 ml) were 

25 collected and assayed for CA125 antigen by ELISA with mAb 
VK-8 as described (15) . Fractions from the MgClg eluate 
containing CA125 reactivity were pooled and used in 
siibsequent studies. Analysis by SDS-PAGE and silver 

staining (Fig. 1) showed that the sample consisted of very 

30 high molecular weight components migrating in the stacking 
gel and in a region just below the gel interface; all these 
species were reactive with mAb OC125 (data not shown) . The 
sample also contained a lower molecular weight species 
originating from the FBS used in the cell cultures. The 

35 amino acid content of the sample was determined as described 
previously (15) . 
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Production of a Rabbit Antiserum to CA125 Antigen 

The CA125 sample was further purified by preparative SDS- 
PAGE and the high molecular weight region of the gel 
indicated in Fig. 1 was excised. After homogenization in 
5 incomplete Preund's adjuvant the gel was used to immunize a 
rabbit (NZB white, female) by 3 subcutaneous injections, 1 
week apart, in 8 sites. Serum was obtained from the rabbit 
10 days after the final immimization. An aliquot (3.0 ml) 
of the serum was absorbed with a pellet of melanoma cells 
10 (SK-MEL-28, -23, -30 and -33; 6.7 ml) that had been treated 
with 0.2% NP40 and 0.1% protease inhibitor cocktail (Sigma 
Co., St. Louis, MO) and the absorbed seirum was used to 
screen a cDNA library. 

15 Screening of OVCAR-3 cDNA Library 

A CDNA library, was constructed from OVCAR-3 mRNA in the XZAP 
Express vector in E. coli as described by the manufacturer 
(Stratagene, La Jolla, CA) . The library contained 7.5 X 10^ 
p.f.u. The library was plated onto 15 plates at 

20 approximately 3 0,000 pfu/150 mm plate and plaques were 
transferred to nitrocellulose and screened with the absorbed 
rabbit antiserum (1:500). Positive plaques were identified 
using anti-rabbit Ig-horseradish peroxidase conjugate 
(Southern Biotechnology Assoc., Birmingham, AL) and 4- 

25 chloro-l^napthol reagent. After subcloning three times and 
retesting with antiserum, 54 positive clones remained. 
These clones contained inserts ranging from 1.5 to >4.0 kbp 
and were designated pBK-CMV-Bl to B54 . 



30 



35 



DNA Sequencing and Sequence Analysis 

The nucleotide sequence of the longest insert (B4) was 
determined using Big Dye terminators (PE Biosystems) and run 
on ABI 3700 or ABI 377 DNA sequencer by the Cornell 
University BioResource Center, Ithaca, NY. Using the T3 
primer and then a series of internal secjuencing primers, 
corresponding to less conserved regions of the gene, a 5965 
bp sequence was identified in B4. Partial sequencing of the 
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Other inserts demonstrated that the majority corresponded to 
different parts of the B4 sequence. 

Northern Blot Analysis 

5 TtiRNA was isolated from a panel of human tumor cell lines, 
which had been serologically typed for CA125 expression, 
using an mRNA Isolation System kit (Invitrogen, Carlsbad, 
CA) . mRNA samples (3 :g) were denatured with formaldehyde, 
separated by electrophoresis in 1.0% agarose and transferred 
10 to nylon sheets. (Gene Screen Plus, NEN, Boston, MA). The 
blot was hybridized with a biot in- labeled probe from an 
insert containing 3 tandem repeat regions (B53) using a 
chemiluminescence procedure following the manufacturer's 
directions (Renaissance reagent; NEN, Boston, MA) , 



15 



20 



Serological Analysis 

Tumor cell lines were assayed for CA125 expression with mAb 
OC125 and VK-8 using a red cell rosetting method as 
described previously (15) . 

RESULTS 



Cloning of CA125/MUC16 cDNA 

Although most studies on the molecular cloning of mucins 
25 utilized polyclonal antisera raised to the deglycosylated 
mucin (apomucin) , in this study we used a • rabbit antiserum 
prepared against the native CA125 antigen., CA125 was 
purified by affinity chromatography on an anti-CA125 
antibody (mAb VK-8) column by elution under mild conditions 
30 with a chaotropic ion (3M MgCl2) as described previously 
(15) . The purified sample had an amino acid composition 
similar to that found in other mucins (Table 1) and 
extremely high CA125 activity (2 X 10^ units/mg protein) , To 
immunize rabbits the preparation was further purified by 
35 SDS-PAGE aind gel slices containing high molecular weight 
CA125 antigen (Pig. 1) were used as the immunogen (in 
incomplete Freund's adjuvant) . The resulting antiserum was 
absorbed with a pellet of non-ovarian cancer cells, after 
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partially soliibilizing the cells in 0.2% NP-40, to remove 
non-specific antibodies. 

Table 1. Comparison of Amino Acid Content of Purified CA125 and Deduced 
5 Amino Acid Composition of CA125/MUC16 and Its Tandem Repeat 

Region 



Amino Acid Purified 
CA125 
moles % 



Asn 


8 5 


Glx 


7.8 


Ser 


11.0 


Glv 


9 0 


His 




Arg 


4.6 


Thr 


12A 


Ala 


3.8 


Pro 


8.7 


Tyr 


2.6 


Val 


5.2 


Met 


1.2 


Cys 




Iso 


2.7 


Leu 


12.4 


Phe 


3.7 


Lys 


3.8 



OA IK/ 

MUC16 


CA125/ 
MUC16 (TR) 
moles % 


8.9 


8.1 


8.1 


7.5 


8.7 


8.9 


7.4 


7.6 


2.8 


2.9 


5.9 


6.3 


11.6 


12.7 


3.1 


2.9 


8.1 


9.0 


3.8 


3.3 


5.0 


4.7 


1.1 


1.0 


1.4 


1.2 


3.3 


3.1 


13.4 


13.7 


3.9 


3.6 


3.0 


2.9 



The absorbed antiserum was used to screen a cDNA 
10 library from OVCAR--3 cells expressed in E. coli. Fifty-four 
positive clones were detected and 53 inserts were sequenced. 
Initial sequencing of the longest clone (B4) showed that it 
had 9 partially conserved repeats of 495 b.p. each and a 
short non-repetitive 3' region. Further sequencing with 
15 internal primers extended the 3' end of the sequence to 
include a stop codon, a polyadenylation signal and a poly A 
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region for a total of 5965 b.p. (Fig. 2) . No clear 
initiation sequence (ATG in a Kozak box) was detected at the 
S'-end, indicating that the derived sequence is incomplete. 
The majority of the other inserts (B1-B53) had sequences 
5 derived from different parts of the B4 sequence. No clones 
containing only 3' non- repetitive sequences were identified. 
Searching GenBank™ revealed no related full-length cDNA but 
numerous related human ESTs (including Accession Numbers: 
AI566650, AI537678, AI276341, AI923224, AI276341, AU158364, 
10 AU140211, AK024365) and one mouse EST (AK003577) were 
detected. With minor exceptions, these sequences were 
identical , to those derived for B4. The nucleotide sequence 
of B4 was designated CA125/MUC16. 

15 Chromosomal Location of CA125/MUC13 Sequences 

Comparison of the B4 sequence with the working draft version 
of the human genome, available from the NCBI, located 
homologous sequences on chromosome 19 (pl3,3 region). As 
sequencing of this region is incomplete and presently 
20 consists of numerous unordered segments of varying lengths, 
more complete genomic information must await the 
availability of further sequencing data. 

Analysis of the Deduced Amino Acid Seqruence of CA125/MUC16 

25 The nucleotide was conceptually translated into an amino acid 
sequence assuming initiation at the ATG of the P-galactosidase 
gene in the vector. The deduced amino acid sequence of 1890 
amino acids (Fig. 3) suggested a mucin- type molecule. It had 
an amino acid composition that was moderately high in serine 

30 (8.9%), threonine (12.5%) and proline (8.8%); this composition 
is very similar to that of the purified CA125 sample used in 
this study (Table 1) , although the proportion of these three 
amino acids is lower than in most other mucins. The sequence 
contained a large region of 9 tandem repeats (TR) of 165 amino 

35 acids each and a C-terminal non- repetitive region of 537 amino 
acids. None of the 9 repeats are identical but numerous 
perfectly conserv^ed residues and short sequences are apparent 
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(Pig. 3) - Two conserved cysteine residues within the TRs are 
notable. The serine and threonine residues are scattered 
throughout the sequence but the TR regions have prominent 
clusters of Ser and Thr, often with adjacent Pro residues which 
5 is a common feature of 0-glycosylation sites (19) , e.g. 
SSVPTTSTP (47-55 and 671-679) and SSVSTTSTTSTP (1139-1147) . 
These characteristics are typical of mucins. The high Leu 
content of this sequence is, however, not found in other cloned 
mucins . Other features of interest include a sequence of 

10 hydrophobic amino acids (25 residues) towards the C-terminal 
end (presumably representing a transmembrane region) and a 
short 31- amino-acid cytoplasmic tail. This region also 
contains a consensus tyrosine phosphorylation site (RRKKEGEY; 
refs. 20, 21). Numerous potential N-linked glycosylation sites 

15 occur in both the TR and non-TR regions (Fig. 3) . 

Northern Blotting 

mRNA from a panel of ten CA125'*' and CA125" cell lines was 
screened with a probe derived from the tandem repeat region 

20 of MUC16. Three of the cell lines gave positive blots and 7 
were xiiireactive (Fig. 4) . The polydisperse pattern obtained 
is typical of that observed with other mucin mRNAs, These 
data corresponded to the expression of CA125 antigen on the 
cell lines as determined by serological analysis with 

25 antibodies to CA125 (mAbs 0C125 and VK-8) . The strongest 
signal was given by mRNA from OVCAR-3 (lane 5), the cell 
line from which the CA125 was purified and the cDNA library 
was produced. 

30 Peptide Sequences Derived from CA125 Antigen 

Purified CA125 was deglycosylated by treatment with 
anhydrous HF at room temperature for 3 hrs (22) . Two 
sequences were obtained from a tryptic digest of the HF- 
treated sample after SDS-PAGE and transfer of the 25-35 kDa 

35 region to a nitrocellulose membrane (22) . The product was 
also digested with Lys-C in guanidinium hydrochloride; 
peptides were isolated by microbore HPLC, and four peptides 
were successfully sequenced (Table 2) . Five of these 
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peptides corresponded to sequences within the TR and one to 

a sequence in the C-terminal region of the deduced MUC16 
sequence (Table 2) . 

5 Table 2. Amino Acid Sequences Derived from Purified CA125 

Sequence Position in CA125/MUC16 sequences 

By Lys-C digestion 

10 AQPGTTNYQRNK 1722-1733 

SPRLDR 1098-1113 

PLFK 120-123, and other locations 

PGL 7-9 and other locations 

By trypsin digestion 

15 KAQPGTTNYQRN 1721-1732 

RTPDTSTMHLATSRT 833-847 

EXPRESSION ANALYSIS OF CA125 NUCLEOTIDE CLONE (PIG. 9) 

20 This figure is the result of an e3q>ression experiment that 
confirms that the sequence actually codes for CA125/ as 
recognized by standard antibodies. 

Method 

25 Clone B53 (in pCMV-tag vector) was transfected into SK-OV-3 
(CA12 5 -negative cell line) with Lipof ectamine Plus reagent. 
Stable clones were selected with neomycin. Cells were 
radiolabeled with glucosamine, immunoprecipitated with 

antibodies and the products analyzed by SDS-PAGE and 

30 autoradiography. 

Result 

Lane 1 (mAb OC125) and lane 2 (mAb VK-8) have bands at the 
top of the gel showing the presence of CA125 antigen in the 
35 transfected cells. No bands were obtained with normal mouse 
serum (negative control) . 

This result proves that the cloned nucleotide sequence 
contains the information for coding for the CA125 antigen. 

40 
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DISCUSSION 

Based on the following evidence, the cloned MUC16 sequence 
is a strong candidate for being the cDNA for the peptide 
core of the CA125 antigen: (i) the CA125 antigen used in 

5 the study was isolated by affinity chromatography on an 
anti-CA125 monoclonal antibody column and was highly 
purified/ (ii) peptides isolated from the purified CA125 
sample corresponded to sequences in the cloned MUC16 
sequence and (iii) MUC16 mRNA levels in a panel of cancer 

10 cell lines, as deteirmined by Northern blotting, correlated 
with the expression of CA125 in the cell lines as determined 
serologically. Moreover, this result supports earlier 
biochemical studi es that had concluded that CA125 antigen is 
a mucin- type molecule (15) . The cloned sequence is 

15 therefore designated as CA125/MUC16. This gene has been 
provisionally localized to chromosome 19pl3.3. Initially 
reported sequences of mucins are rarely full length because 
of the extremely large size of mucin mRNAs and not 
unexpectedly, no apparent 5' initiation signal is evident in 

20 the CA125/MUC16 cDNA sequence. The sequence is believed to 
be complete at the 3'-end as a stop codon, a polyadenylation 
site and a poly A tail have been identified (Fig. 2) . 

Mucins are notoriously difficult to clone because of their 
25 complex structure and high degree of glycosylation. Most 
successful cloning efforts have resulted from screening cDNA 
libraries with a polyclonal antiserum produced" to the 
deglycosylated mucin (reviewed in 23-27) . Thirteen human 
mucins have been cloned or partially cloned to date (MUC-1, 
30 -2, -3, -4, -SAC, -5B, -6, -7, -8, -9, -11, -12 and -13; 
refs. 23-29). In this study, however, a polyclonal 

antiserum to the native mucin was used to isolate a cDNA 
corresponding to the peptide moiety of CA125/MUC16 antigen. 
This approach may have been successful because of the 
35 relatively low content of serine and threonine (representing 
potential O-glycosylation sites) in CA125/MUC16 in 
comparison with most other mucins. The high degree of 
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purity of the isolated antigen, as well as the use of a 
highly absorbed antiserum and the high expression of CA125 
in the OVCAR-B cell line, used to produce the cDNA library, 
may also have been key factors in obtaining positive clones. 

5 

The deduced amino acid sequence of CA125/MUC16 resembles 
other mucins in having serine, threonine and proline as 
major amino acids; however, its high content of leucine is 
characteristic of MUC16. The presence of tandem repeats is 

10 also typical of mucins but the length of the repeat units 
{156 amino acids) is unusual, with only MUC6 having longer 
tandem repeats (30) , Nine TRs have been identified thus 
far, with the last repeat being shorter than the others. 
The amino acid sequences in the TRs are not perfectly 

15 conserved, although 81 positions have conserved amino acids 
and certain motifs e.g. GPLYSCRLTLLR, ELGPYTL, FTLNFTIXNL 
and PGSRKFNXT, are found in all or most of the TRs. Two 
closely spaced cysteine residues (20 amino acids apart) , 
which could form interchain disulfide bonded loops in the 

20 structure, are also perfectly conserved. 

Serine and threonine residues, representing potential O- 
glycosylation sites, are scattered throughout the sequence 
but blocks of clustered Ser and Thr residues are evident in 

25 the TR region. These regions have adjacent or nearby Pro 
residues - a motif that is frequently found In O- 
glycosylation sites (19) . One short serine/threonine-rich 
sequence (PTSSSST) is also found in the C- terminal non-TR 
region. Numerous potential N-glycosylation sites (Asn-X- 

30 Ser/Thr, where X is any amino acid except Pro) are also 
found in the sequence, including two that are perfectly 
conserved in the TR region. It is xinlikely, however, that 
many of these sites are used as the content of N-linked 
glycan chains in purified CA125 is ^rery low (15) . It is 

35 also interesting to note that the sequence contains numerous 
lysine and arginine residues that are remote from the 
postulated O~glycosylation sites and which could explain the 
sensitivity of CA125 to trypsin digestion (16) . Searching 
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for conserved domains in the NCBI Blast site revealed the 
presence of six SEA domains in the deduced protein secjuence. 
The significance of this finding is unclear. Five of the 
domains are in the tandem repeat region and one is in the 
5 non-tandem repeat region (amino acids 1709-1768) . SEA 
domains were originally described as being characteristic of 
membrane -bound proteins with high levels of O-glycosylation 
(31) ; CA125/MUC16 certainly fits this description. 
Recently/ it has been suggested that they also designate 
10 regions susceptible to proteolytic cleavage (32) . 

Two features of the non-TR region are particularly 
interesting- First, is the presence of a 25-amino- acid 
block of hydrophobic amino acids which could represent a 

15 membrane -spanning region. Transmembrane (TM) motifs have 
been found in five other mucins (MUC-1, -3, -4, -12 and 13) . 
The remainder of the mucins that have been cloned lack TM 
regions and instead have cysteine -rich regions with homology 
to van Willebrand factor (27) . Members of this family of 

20 mucins are secreted and form gels that protect and lubricate 
epithelial tissues. CA125 is also secreted from ovarian 
tumors and cell lines but the mechanism for its secretion is 
imclear. Two possibilities can be suggested - (i) a 
proteolytic event, possibly in the C-terminal SEA domain, 

25 cleaves off the luminal N- terminal domain (as in MUCl, refs. 
33, 34) or (ii) alternatively-spliced mRNAs are generated 
that lack the TM region. Indeed, recent sequencing of 
clones B30 and B22 indicates the existence of such sequences 
(data not shown) . The second feature of interest in the 

30 non-TR sequence is a short cytoplasmic tail (31 amino acid) 
that contains a putative tyrosine phosphorylation site 
(RRKKEGEY) . This sequence is conserved in the translated 
mouse EST (AK003577) that has homology with CA125/MUC16 at 
the C-terminal end. MUC-1 has several tyrosine residues in 

35 its cytoplasmic tail and at least one of these is 
phosphorylated in vivo (35, 36). One of the Tyr residues in 
MUCl occurs in a YTNP sequence, a motif that is responsible 
for binding to SH2 domains in proteins involved in 
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intracellular signaling. The putative phosphorylation site 
found in CA125/MUC16 was first recognized . in src family 
proteins (19, 20) . Whether or not this tyrosine residue is 
phosphorylated in CA125 antigen is not known. Pendrick et 

5 al. (37) reported the presence of phosphate in CA125 from 
WISH cells by labeling with ^^P04" and immunoprecipitation 
analysis but concluded that the phosphorylation site(s) are 
on Ser or Thr. Significantly, however, the secretion of 
CA125 is stimulated by epidermal growth factor (EGF) , 

10 presumably through the EGF receptor which is a well-known 
tyrosine kinase (37) . The possibility that CA125/MUC16 is 
phosphorylated on tyrosine and is involved in intracellular 
signaling needs further investigation. Interestingly, no 
EGF domains, which are found in some other mucins (MUC3| 

15 MUC4, MUC12 and 13), were located in CA125 (MUC16) , 

The molecular cloning of CA125 antigen opens the way to a 
better understanding of this important antigen, including 
its physiological function and its role in the biology of 

20 ovarian cancer. Of immediate interest will be* the 

identification of the epitope (s) recognized by the various 
monoclonal antibodies that recognize CAIZB (38) . The 
identification of tandem repeats in the Maci6/CA125 
structure is consistent with the use of a single monoclonal 

25 antibody in double-determinant assays for CA125 levels, 
which would indicate that the antigen has multiple, 
identical epitopes (2) . Such studies could lead to 

improvements in the CA125 assay for the detection of ovarian 
cancer . 

30 
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Second Series Of Experiments 

Identification of a form of the CA125 ovarian cancer antigen 
(MUC16B) lacking a transmembrane sequence 

5 

CA125 antigen is overexpressed in the majority of human 
ovarian carcinomas and is released into the blood stream 
where it can be detected with suitable immiinological assays 
(1) . Approximately 80% of patients with ovarian cancer have 
10 elevated serum CA125 levels and the measurement of these 
levels is a valuable tool for monitoring the clinical status 
of ovarian cancer patients (2,3) . 

Despite the widespread use of CA125 as a serum marker, until 

15 recently, very little information was available on the 
molecular nature of the CA125 antigen. Biochemical studies 
had indicated that the antigen is a large, highly 
glycosylated glycoprotein with mucin- like characteristics 
(4-6) . This suggestion has now been confirmed by the 

20 molecular cloning of CA125 (gene designation: MUC16) by the 
inventors (7,8) and O'Brien and coworkers (9). Both groups 
reported a long DNA species that coded for a protein with a 
large number of partially- conserved, 156 amino acid- long 
tandem repeat (TR) sequences. These tandem repeats contain a 

25 serine, threonine and proline-rich (S/T-rich) area that is a 
potential region of O-glycosylation. The molecule also 
contains a C-terminal non-TR region, a potential membrane- 
spanning sequence and a short cytoplasmic tail. O'Brien et 
al. (9) also reported a large N-terminal non- repetitive 

30 S/T/P-rich region in CA125. 

The presence of a membrane -spanning region in iyiUC16/CA125 
raises the question as to the source of serum CA125 antigen. 
One possibility is that cell-bound CA125 is cleaved by a 

35 protease (s) and released into the surrounding medium. In 
support of this mechanism is the presence in the molecule of 
SEA motifs which are possible protease-sensitive sites 
(7,9) . Another, not mutually exclusive, explanation is that 
MUC16/CA125 is also synthesised as a form lacking a trans- 

40 membrane region that could be directly secreted from cells. 
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During the original cloning of MUC16/CA125 we had isolated a 
small number of cDNA clones that appeared to differ from the 
reported clone (B4) in having a different 3' nucleotide 
sequence . We now show that these species represent a second 
5 form of MUC16/CA125 lacking a C-terminal membrane -spanning 
region that could be a secreted form of the antigen. This 
species (gene designation: MUC16B) also has a long 
serine/threonine-rich N- terminal sequence. 

10 EXPERXHENTAIi PROCEDURKS 
Materials and Methods 

The isolation of cDNA clones B4, B30 and B22 in the pBK-CMV 
vector has been described (7) . Human tumor cell lines 
OVCAR3, SK-OV-8, COL0316, 2774, SK-OV-3 and SK-OV-8 (ovarian 
15 cancer cell lines) , MCF-7 (breast cancer) , IMR-32 
(neuroblastoma) , MKN45 (gastric cancer) , and MCA (sarcoma) 
and their CA125 status have been described (7) . 

RT-PCR procedure and cDNA sequencing 

20 Messenger RNA was isolated from cell pellets using a 
FastTrack 2.0 kit (Invitrogen Life Technologies, Carlsbad, 
CA) . cDNA was then synthesised using a Superscript First 
Strand Synthesis kit as described by the manufacturer 
(Invitrogen) . RT-PCR was performed as follows: 2|xl cDNA, 

25 0.2mM dNTP mix, 4mM MgC12, 0.4 to IpM forward or reverse 
primers and 2.5U Platinum Taq DNA Polymerae (Invitrogen) 
were mixed in a total volume of 50^.1 and the samples were 
cycled as follows: 94*^ for 1 min., 25-35 cycles of 94°C for 
30 sees, 54-65®C for 30secs and 72^C for 30 sees to 3 min. 

30 and a final cycle of 94°C fro 5 min. For the PCR of longer 
products (> 5 kb) the LA PCR kit from Takara Sfuko Co. was 
used under following conditions: 94*'C for 1 min., followed by 
30 cycles of 94''C for 20 sees., 60*^C for 30 sees and 72°C for 
7 Or 10 min. and a final cycle of 94°C for 20 sees., 55 or 

35 60°C for 30 sees., and 72°C for 10 min. RT-PCR products were 
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analyzed by gel electrophoresis in 0.8 or i.0% agarose in 
Tris-acetate -EDTA and stained with ethidiura bromide. 

For sequencing the PGR product was cloned into the Topo TA 
5 cloning vector from Invitrogen) . Inserts were sequenced 
initially with T3 and T7 primers and then with suitable 
forward and reverse primers designed according to the 
derived sequence. Sequencing was performed either by our own 
sequencing facility or by the Cornell University Facility 
10 using a BigDye Terminator Primer Sequencing Kit (Perkin 
Elmer/ABI) in ABI 3700 or ABI 377 DNA seqenators . The 
sequences were aligned visually for the repeat region 
sequences and with the aid of Vector NT for other sequences. 

15 3 ■ and 5 ■ RACE procedures 

These procedures were performed with the First Choice RLM- 
RACE kit (Ambion Co., Austin TX) using suitable forward 
primers for the 3 * and reverse primers for 5 ' region 
respectively. For the 5' RACE the outer gene-specific primer 
20 was 5 * TCACAGTCCCTACATTGACTA3 • and the inner primer was 
5 » CATGGCACATCTCCAGGGT3 ' . The products were cloned into TA 
vector and sequenced as described above. 

RESULTS 

25 Cloning and sequencing o£ B30 cDNA 

During the original expression cloning of MUC16 (7) we 
observed that the majority of the clones detected by 
screening the cDNA library with a rabbit antiserum were 
shorter forms of the longest clone (B4) reported (7) and 

30 contained varying numbers of TRs, a non-TR region, a 
potential TM region and a cytoplasmic tail. However a few 
clones were isolated that appeared to be different in that 
they lacked a restriction enzyme site (Xho) present in the 
B4 family of inserts . The cDNA from one of these clones 

35 (B30) was completely sequenced using the T3 primer of the 
vector initially and, subsequently, new forward and reverse 
primers derived from the less conserved regions of the new 
sequence. The B30 insert had a total of 4103 bp with a stop 
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codon at 3593 bp. This was followed by 3' non- translated 
region and finally, a poly A sequence. Despite the presence 
of a poly- A sequence no obvious polyadenylation site was 
observed (Pig, 10) . Clone B22 was partially sequenced and 
5 shown to be a shorter (2432 bp) form identical to the 3' 
sequence of B3 0. 

Conceptual translation of the B3 0 sequence indicated a 
protein composed entirely of 7.7 TRs of 156 amino acids 
each. The 4.5 C- terminal repeats were identical to sequences 
found in the B4 clone and three new partially- conserved TRs 
were detected N-teoniiinal to the B4 sequence. The new repeats 
contained the potential cysteine loop, the 2 conserved N- 
glycosylation sites and the serine/threonine-rich region 
15 found, in clone B4 of MUC16. No non-TR, transmembrane or 
cytoplasmic sequences were present in this new species of 
MUC16. Searching the NCBI database with this sequence 
yielded two EST (.BE005912 and BI016218) corresponding to 
repeat number 3 in the B30 sequence. Surprisingly, no ESTs, 
20 or even genomic, sequences corresponding to the non- 
translated 3' region of B30 were detected in the NCBI 
databases. In order to confirm that the new form of MUCIS 
was not a cloning artifact 3' RACE was performed with RNA 
from the OVCAR3 cell line. Sequences corresponding to the 
last repeat and the untranslated region were identified 
(data not shown) . We also examined a panel of cancer cells 
for transcripts corresponding to the 3* region by RT-PCR 
using primers from repeat 8 and the 3' end of the 
untranslated region of B3 0. pcr products were found only 
30 with mRNA from cells known to express CA125, again 
confirming the relationship of B30 to CA125. 

Coznplete sequence of Mircl6B/CA125 

Searching the NCBI genomic database with sequences derived 
from B30 indicated that numerous sequences related to this 
species were located on a genomic sequence file designated 
NT 025133.6 (Pig. 13). At present (March 2002), this region, 
located on chromosome 19 pl3.3/pl3.2, consists of 53 
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unordered sequences of varying length. This data does not 
allow the complete sequence of MUC16 to be easily assembled, 
however by designing suitable RT-PCR primers from the 
genomic sequence for RT^PCR it was possible to amplify and 
5 sequence cDNA that extended the B3 0 by 6.5 partially 
conserved tandem repeat units (Figs. 11 and 12) . This 
results in the identification of a total of 14 repeats in 
the new MUC16 sequence. Adjacent to the first exon of the 
5 '-most repeat sequence in NT 025133.6 we noticed a very 

10 long potential open reading frame. This region does not 
contain any repeat sequences but is rich in serine, 
threonine and proline residues. Also, in NT 025133.6 we 
observed a short putative exon containing the ATG sequence 
suggested by O'Brien et al . (9) to be the initiating codon 

15 of CA125 (Pig. 13) . Again by designing suitable primers in 
this region, PGR products corresponding to this new 5' 
region were cloned and sequenced. The NCBI database contains 
ESTs corresponding to portions of the 5» region of this 
sequence (AK056791, AK056791 and AF41442) . One of these ESTs 

20 extended into the 5» region beyond the ATG designated by 
O'Brien et al . (9). In fact NT 025133.6 contains an 
extremely long potential open reading frame (positions 
176,04,53-179,693) corresponding to this region. The Celera 
public access database also contains genomic sequence for 

25 this region and, significantly, has an extremely long 
hypothetical transcript sequence (hCT1645865) containing all 
the putative exons in 176,053-179,693 and 139,330-158,760 
b.p. regions of NT 025133.6. Primers were ^also designed to 
sequence these regions and by application of RT-PCR to 

30 OVCAR-3 mRNA it was possible to confirm these sequences. 
Only minor differences between the experimentally- derived 
sequence and the data base sequences except for numerous 
differences in the. 3 » region of the serine/threonine-rich 
were it joins the tandem repeat region between the piablished 

35 data and our sequence. This long S/T/P-rich coding region 
has numerous ATG codons which could serve as initiation 
sites for mRNA synthesis (some of them fitting a Kozak 
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consensus motif, ref. 10) was difficult to pick a likely 
site. Application of 5» RACE with a series of primers in 
different locations in the sequence finally yielded a primer 
that gave a clear cDNA product and sequencing of this 
5 product indicated a start site at position 261 (Figs. 11 and 
12) . This ATG is located in a classical Kozak box. To 
confirm that the 5 ' S/T/P-coding region was in fact related 
to the tandem repeat region and codes for the CA125 antigen 
we performed RT~PCR on mRNA from a panel of cell lines (as 
10 we had done for the 3 ' end) with primers corresponding to a 
sequence close to the 5' end; the result showed a complete 
correlation between generation of the bp product and 
expression of CA125 in these cell lines. 

15 Conceptual translation of the assembled nucleotide sequence 
(18405 bp) demonstrated a protein of 5851 amino acids with 
an extremely long (3650 amino acids) S/T/P-rich C-terminal 
(containing 17.2% serine, 19.5% threonine and 9.0% proline) 
followed by a region of 14 partially- conserved repeats of 

20 156 amino acids each as described above (Fig. 12) . The 
sequence teirminated after one of the S/T/P-rich regions in 
the last TR with no hydrophobic C-terminal transmembrane 
region being obseirved. 

25 DISCUSSION 

Using a combination ^ of expression cloning and RT-PCR 
approaches we have identified a new species of CA125 
(designated MUC16B) that has a long serine/threonine-rich N- 
terminal region and a C-terminal region of 14 tandem repeats 

30 but no apparent transmembrane region. This product could 
therefore be a secreted form of CA125 although no secretory 
peptide sequence is present at the N-terminus. The tandem 
repeat region is similar in construction to the repeats 
previously observed in MUC16/CA125. These repeats contain a 

35 small region rich in serine and threonine which could 
represent O-glycosylation sites. The N-terminal region has 
numerous serine and threonine residues scattered through the 
sequence and these could also be O- glycosylated. CA125 is 
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known to be highly glycosylated (77 % by weight) and most of 
this consists of 0-glycosylated chains (4) . Two conserved 
potential N-glycosylated sites occur in each tandem repeat 
and these could also contribute to the carbohydrate content 
5 of CA125, although this level is probably quite low (4) , 

At present it is unclear as to whether the CA125 molecules 
identified by the inventors (7,8) and 0»Brien et al. (9) have 
the same long N-temainal sequence. O'Brien et al . (9) 
^ 10 described a N- terminal sequence of 1638 amino acids in contrast 
to the XXX amino acids described here for MUC16B. However, the 
S/T/P-rich region was connected to the TR regions and the non- 
TR, trans -membrane and cytoplasmic regions similar to those 
reported by us in MUC16/C:A125- Using 5* RACE they detected an 

15 initiating methionine (at position 6435 in Pig, 11) whereas we 
could detect such a site only at position 262. Also \mclear is 
whether either of the N-terminal S/T/P-rich sequences are 
present in the MUC16/CA125 species reported previously as clone 
B4 was not complete at the 5 ' end (7), We were unable to 

20 generate products by performing RT-PCR with primers located in 
MUC16B repeat region and in the 3' portion of the MUC16 tandem 
repeats not found in MUC16B, indicating that MUC16 and MUC16B 
have different repeat sequences at their 5 ' -end and possibly, 
therefore, a shorter or different S/T-rich regions, . Such a 

25 situation may account for the larger number of repeats that 
were identified by O'Brien et al . (9) and those that can be 
found in the genome data bases and not in MUC16B. 

MUC16B/CA125 is an extremely long molecule with a peptide 
30 chain of 5851 amino acids and an Mr of about 600,000. Many 
other cloned mucins (11,12) also have extremely long peptide 
sequences, e. g. MUC5B has 5662 amino acids and a Mr of 
about 600,000 (13). By pulse-chase e3q)eriments we had 
previously identified a putative CA125 precursor species of 
35 about 400 JcDa which, given the uncertainties inherent in 
very high molecular sizes determined by SDS-PAGE, is 
consistent with this result (5) . It is also interesting to. 
note that the precursor consisted of a doublet of two 
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closely- spaced species on SDS-PAGE which could correspond to 
MUC16 and MUC16B (5) . 

Although MUC16B/CA125 has many of the attributes expected of 
5 a mucin species (i.e. large size, high serine, threonine and 
proline content, high level of O-glycosylation and presence 
of tandem repeats) it also has some unique features. These 
include the presence of potential cysteine loops in the 
repeat region and the segregation of the O-glycosylation 
sites into a small region of each repeat. Another unusual 
feature is that the repeat region is not coded by one long 
exon; rather each repeat unit contains 5 small exons 
[O'Brien et al. (9) and our unreported data] . In CA125 the 
longest exons are f oiuid at the 5 » end and code for a non- 
repeat serine/ threonine-rich region. Because of it large 
size CA125 is extremely difficult to isolate in an intact 
form from biological materials. In our original purification 
of CA125 we described an extremely large species migrating 
in the stacking gel of a SDS-PAGE gel (4), whereas 
subsequently we found smaller species migrating mainly in 
the upper region of the separating gel (7) . Recently, in a 
report from the Third ISOBM Workshop (14) it was reported 
that CA125 can be degraded by sonication procedures, as well 
as by proteolytic digestion. 
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Another feature of CA125 that still needs to be completely 
elucidated is the location in the molecule of the antibody- 
detected epitopes. Presently available data indicated that 
they are mainly located in the tandem repeat regions of the 
molecule (8, 9) and this would be consistent with the ability 
of a single antibody to useful in sandwich assays (1) . 
Further work on this problem will be needed to further 
delineate the structures of the epitopes and whether more 
specific assays for CA125 can be devised. The molecular 
cloning of CA125 also opens up approaches to determining the 
function of CA125 and an understanding of its role in 
ovarian malignancy. 
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1. An isolated nucleic acid molecule comprising sequences 
encoding the CA125 protein or a portion thereof. 

5 

2. The gene encoding the CA125 protein, 

3 . The isolated nucleic acid molecule of claim 1 
comprising sequence set forth in Figure 6 and the 

10 corresponding CA125 protein comprising sequence set 

forth in Figure 5 . 

4. The isolated nucleic acid molecule of claim 1 
comprising sequence set forth in Figure 7 and the 

15 corresponding CA125 protein sequence set forth in 

Figure 8 . 

5. The nucleic acid of claim 1 comprising sequence set 
forth in Figure 11. 

20 

6 . The nucleic acid of claim 1 encoding protein comprising 
at least a portion of the amino acid sequence set forth 
in Figure 12 . 

25 7, The gene of claim 2 comprising sequence set forth in 
Figure 10. 

8. The isolated nucleic acid molecules of claim 1, 2, 3, 
4, 5, 6, or 1, wherein the nucleic acid is RNA, cDNA, 

30 genomic DNA, or synthetic DNA. 

9. A vector comprising the nucleic acid molecule of claim 
1, 2, 3, 4, 5, 6, 1, or 8. 

35 10. The vector of claim 9, designated as pBK-CMV-B4 
comprising sequence set forth in Figure 6 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 5 . 
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11. The vector of claim 9, designated as pBKCMV-B30 
comprising sequence set forth in Figure 7 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 8. 

5 

12. The vector of claim 9, designated as pCMV-Tag-B4 
comprising sequence set forth in Figure 6 and the 
corresponding CA12'5 protein comprising sequence set 
forth in Figure 5 . 

10 

13. The vector of claim 9, designated as pCMV-Tag-B30 
comprising sequence set forth in Figure 7 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 8. 

14. An expression system comprising the vector of claim 9. 

15. The escpression system of claim 14, wherein the system 
is a eukaryotic or prokaryotic system. 

16. A method for producing CA125 protein comprising the 
expression system of claim 14. 

17. An isolated nucleic acid molecule comprising sequence 
25 capable of specifically hybridizing to the sequences of 

claim 1 or 2 . 
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The nucleic acid molecule of claim 17 capable of 
inhibiting the expression of the CA125 protein. 



19. A method of inhibiting expression of CA125 inside a 
cell by vector-directed expression of an RNA able to 
hybridize with the RNA of CA125 . 

35 20. The nucleic acid molecule of claim 17 or 18 which is at 
least a lOmer. 

21. The nucleic acid molecule of claim 17 or 18 which is at 
least a 20mer. 

40 
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22. A method to detect ovarian cancer in a subject 
comprising steps of: 

a) contacting the isolated nucleic acid molecule of 
5 claim 17 with RNA from a sample from the subject 

under conditions permitting the formation of a 
hybrid complex, and 

b) detecting the hybrid complex, wherein a positive 
10 detection indicates the expression of the antigen 

and presence of cancer . 

23 . A method of monitoring ovarian cancer therapy in a 
subject comprising steps of: 

15 

a) contacting the isolated nucleic acid molecule of 
claim 17 with RNA from a sample from the subject 
under conditions permitting the formation of a 
hybrid complex, and 

20 

b) measuring the amount of the hybrid complex, 
wherein a decrease in the hybrid complex indicates 
the success of therapy. 

25 24 . A method for inhibiting the e^qjression of the CA125 
protein comprising contacting an appropriate amount of 
the nucleic acid molecule of claim 17 or 18 so that 
hybridization of the gene or transcript encoding the 
CA125 protein will occur, thereby inhibiting the 

30 expression of the protein. 

25. A composition comprising the isolated nucleic acid 
molecule of claim 17 or 18. 



35 26 . 



A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amount of the isolated 
nucleic acid molecules of claim 1 or 2 . 
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27. A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amount of an expression 
vector with the nucleic acid molecules which, when 
expressed, are capable of producing a pi-oduct which 

5 induces an immune response to CA125 protein. 

28. The vaccine of claim 27, wherein the nucleic acid 
molecule comprises sequences encoding human CA125 
protein or a portion thereof. 

10 

29. The vaccine of claim 28, wherein the expressed human 
secjuence is linked to a carrier. 

30. The vaccine of claim 27, wherein the nucleic acid 
15 molecule comprises a nonhuman sequence. 

31. The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a primate sequence. 



20 



25 



32. The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a murine seq[uence. 

33. The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a synthetic sequence, which, when 
expressed, is capable of producing a product which 
induces an immune response to CA125 protein. 



34. The vaccine of claim 33, wherein the sequence 
hybridizes with or is homologous to the sequences 

30 encoding human CA125 protein. 

35. The vaccine of claims 26-34, further comprising a 
suitable adjuvant. 



35 



36. 



The vaccine of claims 26-34, wherein the adjuvant is an 
alum. 
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37. The vaccine of claims 26-36, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial/ or lung 
carcinoma. 

5 38. A method to treat a cancer which expresses CA125 in a 
subject comprising administering to the subject an 
appropriate amount of the vaccine of claims 26-36. 

39. The method of claim 38, wherein the cancer is an 
10 ovarian, pancreatic, breast, endometrial, or lung 

carcinoma . 

40. A vaccine for a cancer which expresses CA125 comprising 
an appropriate amount of the expressed CA125 protein 

15 corresponding to the sequence in claim 1 . 

41. A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amo\int of a substance which 
induces an immune response to CA125 protein. 

20 

42. The vaccine of claim 41, wherein the stibstance is a 
polypeptide or a peptide. 

43. The vaccine of claim 42, wherein the polypeptide 
25 comprises sequences encoding human CA125 protein or a 

portion thereof . 

44. The vaccine of claim 43, wherein the expressed human 
sequence is linked to a carrier. 

30 

45. The vaccine of claim 41, wherein the polypeptide 
comprises a nonhuman sequence. 

46- The vaccine of claim 45, wherein the polypeptide 
35 comprises a primate sequence. 

47. The vaccine of claim 45, wherein the polypeptide 
comprises a murine sequence. 
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48. The vaccine of claim 42, wherein the polypeptide 
comprises a synthetic sequence, which, when expressed, 
is capable of producing a product which induces immune 
response to CA125 protein. 



25 
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50 



The vaccine of claims 40-48, further comprising a 
suitable adjuvant. 

The vaccine of claim 49, wherein the adjuvant is an 



10 alum. 

51. The vaccine of claims 40-50, wherein the expressed 
protein is conjugated to a protein carrier to increase 
the immunogenicity , 

15 

52, The vaccine of claims 40-51, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung 
carcinoma . 

20 53 . A method to treat a cancer which expresses CA125 in a 
subject comprising administering to the subject an 
appropriate amount of the vaccine of claims 40-51. 



54 • A method to prevent a cancer which expresses CA125 in a 
sxibject comprising administering to the subject an 
appropriate amount of the vaccine of claims 40-51. 

55. The method of claims 53 or 54, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung 

30 carcinoma. 

56. A method for the diagnosis of a cancer which expresses 
CA125 by detecting CA125 -expressing cells in the blood 
or other fluids of patients based on the nucleic acid 

35 sequence which encodes CA125. 



57. 



A method for monitoring the therapy of a cancer which 
expresses CA125 by measuring the expression of CA125- 
e^^ressing cells in the blood or other fluids of 
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patients based on the nucleic acid sequence which 
encodes CA125, a decrease of either the number of 
CA12 5 -expressing cells or level of protein expression 
in the cell, indicating the success of the therapy- 

5 

58. The method of claim 56 or 57, wherein the detection is 
based on polymerase chain reaction with appropr-iate 
primers . 

10 59 . A method of producing CA125 protein comprising steps of : 

a) constructing a vector adapted for expression in a 
cell which comprises the regulatory elements 
necessary for expression of nucleic acid in the 
15 cell operatively linked to the nucleic acid 

encoding the CA125 protein so as to permit 
expression thereof; 



20 



25 



30 



35 



b) placing the cells of step (a) under conditions 
allowing the expression of the CA125 protein; and 

c) recovering the CA125 protein so expressed, 

60. The method of claim 59, wherein the cell type is 
selected from the group consisting of bacterial cells, 
yeast cells, insect cells, and mammalian cells. 

61. The CA12 5 . protein expressed by the method in claim 59 
or 60 . 

62. A method for production of antibodies against CA125 
protein using the protein of claim 61. 

63 . Antibodies produced by the method of claim 62 . 

64. A method for monitoring the therapy of cancer which 
expresses CA125 using the antibodies of claim 63. 



40 



A method of diagnosis of cancer which expresses CA125 
using the antibodies of claim 63 . 
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66. A method for determining the immunoreactive part of 
CA125 comprising contacting antibodies which are known 
to be reactive to CA125 with the protein of claim 61. 

5 

67. A transgenic nonhuman organism comprising the isolated 
nucleic acid molecule of claim 1 or 2 . 



10 



15 



20 



68. A transgenic nonhuman mammal of claim 67. 

69. A nonhuman organism, wherein the expression of CA125 is 
inhibited. 

70. The nonhuman mammal of claim 69. 

71. The nonhuman mammal of claim 70, wherein the mammal is 
a mouse. 



72 



A method for screening a compound for treatmen.t of 
cancer which expresses CA125 protein comprising 
administering the compound to the transgenic nonhuman 
organism of claims 67-71, a decrease in expression of 
CA125 protein indicating that the compound may be 
useful for treatment of the cancer. 
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FIGURE 2 

YYQSHLD LEDLQ * 

TACTACCAGTCACACCTAGACCTGGAGGATCTGCAATGACTGGAACTTGCC 5685 

GGTGCCTGGGGTGCCTTTCCCCCAGCCAGGGTCCAAAGAAGCTTGGCTGG 5736 

GGCAGAAATAAACCATATTGGTCGGAAAAAGGAAGGAGAATACAACGTCCA 5787 

GCAACAGTGCCCAGGCTACTACCAGTCCCCCCTAGACCTGGAGGATTTGCA 5838 

ATGACTGGAACTTGCCGGTGCCTGG6GTGCCTTTCCCCCAGCCAGGGTCC 5889 

AAAAAAGCTTGGCTGGGGGAA AAATAAAC CCATATTGGTCGGAAAAAAAAAA 5940 

AAAAAAAAAAAAAAAAAAAAAAAAA 5965 
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FIGURE 4 
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RVDPIGPGIJ)RERLYWEI^QLTNSITELGPYTII)RDSLYWGFOTWSSWTTSTPGTS 

TVHLATSGTPSSLPGHTAPWLLIPFTLNFTITNLHYEEISMQHPGSRKI^^ 

LKPLFKSTSVGPLYSGCRJLTLLRPEKHGAATGVDAICTLRLDPTGPGLDRERLYWELS 

QLTNSVTELGPYTLDRDSLYVNGFTHRSSVPTTSIPGTSAVHLETSGTPASLPGHTAPG 

P^XWFTIJ^TIT^ILQYEEDMRHPGSR^af^mBRV^ 

LLRPEKRGAATGVDTICTHRLDPLNPGLDREQLYWELSKLTRGIIELGPYLLDRGSLY 

VNGFTHKNFWrrSTPGTSTVHLGTSETPSSLPRPIVPGPlXVPFrLNFTIT^ 

RHPGSRKFNTTERVIXJGIXRPIJfKl^IGPLYSSCm-TLLRPEKDI^^ 

PDPQSPGLNREQLYWELSQLTHGITELGPYTLDRDSLYVDGFTHWSPIPTTSTPGTSIV 

NLGTSGIPPSIJPETTATGPIXWFTLNFTITNLQYEEI^GHPGSRK^ 

IJKSTSVGPLYSGCRLTLIJ«»EEI)GVATRVDAICTHRPDPK]DPGLDRQQLYWELSQLT 

HSITELGPYTLDRDSLYVNGFTQRSSWTTSTPGTFTVQPETSETPSSIPGPTATGPVLL 

PFTLNFITimQYEEDMHRPGSRKFNTTERVLQGLLMPLFKNTSVSSLYSGCRLTLLRP 

EKDGAATRVDAVCTHRPDPKSPGII)RERLYWKLSQLTHGITELGPYTLDRHSLYVN 

GFraQSSMTTTRTPDTSTMHIATSRTPASLSGFITASPLLVIJTINFIT^^ 

HPGSRKPNTTERVLQGLLRPWKNTSVGPLYSGCRLTLUIPKKDGAATKVDAICT^ 

PDPKSPGlJDREQLYWELSQLTHSriEUjPYTLDRDSLYVNGFTQRSSVPTTSIPGTPTV 

DLGTSGTPVSKPGPSAASPLLVLFILNFTITNLRYEENMQHPGSRKFlSriTERV^ 

RSLFKSTSVGPLYSGCRLTLLRPEKDGTATGVDAICTHHPDPKSPRLDREQLYWELSQ 

LTHlSflTELGPYALDNDSIJfWGFTHRSSVSTTSTPGTPTVYLGASKTPASIFGPSAA^ 

liHJFTLNFTITISnLRYEENMWPGSRKFl^^ 

RPEKDGEATGVDAICTHRPDPTGPGLDREQLYLELSQLTHSITELGPYTLDRDSLYVN 

GFTHRSSWTTSTGWSBEPFTLNFITNl^RYMADMGQPGSLKFNITDNVMQHLLSPL 

FQRSSLGARYTGCRVIALRSVKNGAETRVDLLCTYLQPLSGPGLPIKQVFHELSQQTH 

Grm.GPYSIX>KDSLYIJSrGYNEPGPDEPPTTPKPATIiriPPLSEATTAMGYHIXTLTL 

MTISNLQYSPDMGKGSATFNSTEGVLQHLLRPLFQKSSMGPFYLGCQLISLRPEKDG 

AATGVDTTCTYHPDPVGPGIJ>IQQLYWEI5QLTHGWQLGFYVLDRDSLFINGYAPQ 

NLSIRGEYQINFHIVNWlvnJSNPDPTSSEYITLLRDIQDKVTT^ 

1SILTMDSVLVTVKALFSSNIJ)PSLVEQWLDKTLNASFHWLGSTYQLVDI^ 

WQPTSSSSTQOTYPNFTITNIJ'YSQDKAQPGTTlSri'QRNKRMEDALN^ 

YFSDCQVSTFRSWNRHHTGVDSIXOTSPIARRVDRVAIYEEFIJR^^ 

IJDRSSVLVDGYSPNRNEPLTGNSDIJ»FWAVinGIAGIXGIJT<XICGVLV^ 

GEYNVQQQCPGYYQSHLDLEDL 
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1 cgcgttgatc ccatcggacc tggactggac agagagcggc tatactggga gctgagccag 
61 ctgaccaaca gcatcacaga gctgggaccc tacaccctgg atagggacag tctctatgtc 
121 aatggcttca acccttggag ctctgtgcca accaccagca ctcctgggac ctccacagtg 
181 cacctggcaa cctctgggac tccatcctcc ctgcctggcc acacagcccc tgtccctctc 
241 ttgataccat tcaccctcaa ctttaccatc accaacctgc attatgaaga aaacatgcaa 
301 caccctggtt ccaggaagtt caacaccacg gagagggttc tgcagggtct gctcaagccc 
361 ttgttcaaga gcaccagcgt tggccctctg tactctggct gcagactgac cttgctcaga 
421 cctgagaaac atggggcagc cactggagtg gacgccatct gcaccctccg ccttgatccc 
481 actggtcctg gactggacag agagcggcta tactgggagc tgagccagct gaccaacagc 
541 gttacagagc tgggccccta caccctggac agggacagtc tctatgtcaa tggcttcacc 
601 catcggagct ctgtgccaac caccagtatt cctgggacct ctgcagtgca cctggaaacc 
661 tctgggactc cagcctccct ccctggccac acagcccctg gccctctcct ggtgccattc 
721 accctcaact tcactatcac caacctgcag tatgaggagg acatgcgtca ccctggttcc 
781 aggaagttca acaccacgga gagagtcctg cagggtctgc tcaagccctt gttcaagagc ' 
841 accagtgttg gccctctgta ctctggctgc agactgacct tgctcaggcc tgaaaaacgt 
901 ggggcagcca ccggcgtgga caccatctgc actcaccgcc ttgaccctct aaaccctgga 
961 ctggacagag agcagctata ctgggagctg agcaaactga cccgtggcat catcgagc^ 
1021 ggcccctacc tcctggacag aggcagtctc tatgtcaatg gtttcaccca tcggaacttt 
1081 gtgcccatca ccagcactcc tgggacctcc acagtacacc taggaacctc tgaaactcca 
1 141 tcctccctac ctagacccat agtgcctggc cctctcctgg tgccattcac cctcaacttc 
1201 accatcacca acttgcagta tgaggaggcc atgcgacacc ctggctccag gaagttcaat 
1261 accacggaga gggtcctaca gggtctgctc aggcccttgt tcaagaatac cagtatcggc - 
1321 cctctgtact ccagctgcag actgaccttg ctcaggccag agaaggacaa ggcagccacc 
1381 agagtggatg ccatctgtac ccaccaccct gaccctcaaa gccctggact gaacagagag 
1441 cagctgtact gggagctgag ccagctgacc cacggcatca ctgagctggg cccctacacc 
1 501 ctggacaggg acagtctcta tgtcgatggt ttcactcatt ggagccccat accaaccacc 
1561 agcactcctg ggacctcxjat agtgaacctg ggaacctctg ggatcccacc ttccctccct 
1621 gaaactacag ccaccggccc tctcctggtg ccattcacac tcaacttcac catcactaac 
1681 ctacagtatg aggagaacat gggtcaccct ggctccagga agttcaacat cacggagagt 
1741 gttctgcagg gtctgctcaa gcccttgttc aagagcacca gtgttggccc tctgtattct 
1801 ggctgcagac tgaccttgct caggcctgag aaggacggag tagccaccag agtggacgcc 
1861 atctgcaccc accgccctga ccccaaaatc cctgggctag acagacagca gctatactgg 
1921 gagctgagcc agctgaccca cagcatcact gagctgggac cctacaccct ggatagggac 
1981 agtctctatg tcaatggttt cacccagcgg agctctgtgc ccaccaccag cactcctggg 
2041 actttcacag tacagccgga aacctctgag actccatcat ccctccctgg ccccacagcc 
2101 actggccctg tcctgctgcc attcaccctc aattttacca tcattaacct gcagtatgag 
2161 gaggacatgc atcgccctgg ctccaggaag ttcaacacca cggagagggt ccttcagggt 
2221 ctgcttatgc ccttgttcaa gaacaccagt gtcagctctc tgtactctgg ttgcagactg 
2281 accttgctca ggcctgagaa ggatggggca gccaccagag tggatgctgt ctgcacccat 
2341 cgtcctgacc ccaaaagccc tggactggac agagagcggc tgtactggaa gctgagccag 
2401 ctgacccacg gcatcactga gctgggcccc tacaccctgg acaggcacag tctctatgtc 
2461 aatggtttca cccatcagag ctctatgacg accaccagaa ctcctgatac ctccacaatg 



GNSOOCIO: <WO ^02092e36A2J_> 



wo 02/092836 



PCT/US02/14768 



7/25 

FIGURE 6 
(cont • ) 

2521 cacctggcaa cctcgagaac tccagcctcc ctgtctggac ctacgaccgc cagccctctc 
2581 ctggtgctat tcacaattaa cttcaccatc actaacctgc ggtatgagga gaacatgcat 
2641 caccctggct ctagaaagtt taacaccacg gagagagtcc ttcagggtct gctcaggcct 
2701 gtgttcaaga acaccagtgt tggccctctg tactctggct gcagactgac cttgctcagg 
2761 cccaagaagg atggggcagc caccaaagtg gatgccatct gcacctaccg ccctgatccc 
2821 aaaagccctg gactggacag agagcagcta tactgggagc tgagccagct aacccacagc 
2881 atcactgagc tgggccccta caccctggac agggacagtc tctatgtcaa tggtttcaca 
2941 cagcggagct c^gcccac cactagcatt cctgggaccc ccacagtgga cctgggaaca 
3001 tctgggactc cagtttctaa acctggtccc tcggctgcca gccctctcct ggtgctattc 
3061 actctcaact tcaccatcac caacctgcgg tatgaggaga acatgcagca ccctggctcc 
3121 aggaagttca acaccacgga gagggtcctt cagggcctgc tcaggtccct gttcaagagc 
3181 accagtgttg gccctctgta ctctggctgc agactgactt tgctcaggcc tgaaaaggat 
3241 gggacagcca ctggagtgga tgccatctgc acccaccacc ctgaccccaa aagccctagg 
3301 ctggacagag agcagctgta ttgggagctg agccagctga cccacaatat cactgagctg 
3361 ggcccctatg ccctggacaa cgacagcctc tttgtcaatg gtttcactca tcggagctct ' 
3421 gtgtccacca ccagcactcc tgggaccccc acagtgtatc tgggagcatc taagactcca 
3481 gcctcgatat ttggcccttc agctgccagc catctcctga tactattcac cctcaacttc 
3541 accatcacta acctgcggta tgaggagaac atgtggcctg gctccaggaa gttcaacact 
3601 acagagaggg tccttcaggg cctgctaagg cccttgttca agaacaccag tgttggccct 
3661 ctgtactctg gctgcaggct gaccttgctc aggccagaga aagatgggga agccaccgga 
3721 gtggatgcca tctgcaccca ccgccctgac cccacaggcc ctgggctgga cagagagcag 
3781 ctgtatttgg agctgagcca gctgacccac agcatcactg agctgggccc ctacacactg 
3841 gacagggaca gtctctatgt caatggtttc acccatcgga gctctgtacc caccaccagc 
3901 accggggtgg tcagcgagga gccattcaca ctgaacttca ccatcaacaa cctgcgctac 
3961 atggcggaca tgggccaacc cggctccctc aagttcaaca tcacagacaa cgtcatgcag 
4021 cacctgctca gtcctttgtt ccagaggagc agcctgggtg cacggtacac aggctgcagg 
4081 gtcatcgcac taaggtctgt gaagaacggt gctgagacac gggtggacct cctctgcacc 
4141 tacctgcagc ccctcagcgg cccaggtctg cctatcaagc aggtgttcca tgagctgagc 
4201 cagcagaccc atggcatcac ccggctgggc ccctactctc tggacaaaga cagcctctac 
4261 cttaacggtt acaatgaacc tggtccagat gagcctccta caactcccaa gccagccacc 
4321 acattcctgc ctcctctgtc agaagccaca acagccatgg ggtaccacct gaagaccctc 
4381 acactcaact tcaccatctc caatctccag tattcaccag atatgggcaa gggctcagct 
4441 acattcaact ccaccgaggg ggtccttcag cacctgctca gacccttgtt ccagaagagc 
4501 agcatgggcc ccttctactt gggttgccaa ctgatctccc tcaggcctga gaaggatggg 
4561 gcagccactg gtgtggacac cacctgcacc taccaccctg accclgtggg ccccgggctg 
4621 gacatacagc agctttactg ggagctgagt cagctgaccc atggtgtcac ccaactgggc 
4681 ttctatgtcc tggacaggga tagcctcttc atcaatggct atgcacccca gaatttatca 
4741 atccggggcg agtaccagat aaatttccac attgtcaact ggaacctcag taatccagac 
4801 cccacatcct cagagtacat caccctgctg agggacatcc aggacaaggt caccacactc 
4861 tacaaaggca gtcaactaca tgacacattc cgcttctgcc tggtcaccaa cttgacgatg 
4921 gactccgtgt tggtcactgt caaggcattg ttctcctcca atttggaccc cagcctggtg 
4981 gagcaagtct ttctagataa gaccctgaat gcctcattcc attggctggg ctccacctac 
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5041 cagttggtgg acatccatgt gacagaaatg gagtcatcag tttatcaacc aacaagcagc 
5101 tccagcaccc agcacttcta cccgaatttc accatcacca acctaccata ttcccaggac 
5161 aaagcccagc caggcaccac caattaccag aggaacaaaa ggaatattga ggatgcgctc 
5221 aaccaactct tccgaaacag cagcatcaag agttattttt ctgactgtca agtttcaaca 
5281 ttcaggtctg tccccaacag gcaccacacc ggggtggact ccctgtgtaa cttctcgcca 
5341 ctggctcgga gagtagacag agttgccatc tatgaggaat ttctgcggat gacccggaat 
5401 ggtacccagc tgcagaactt caccctggac aggagcagtg tccttgtgga tgggtattct 
5461 cccaacagaa atgagccctt aactgggaat tctgaccttc ccttctgggc tgtcatcttc 
5521 atcggcttgg caggactcct gggactcatc acatgcctga tctgcggtgt cctggtgacc 
5581 acccgccggc ggaagaagga aggagaatac aacgtccagc aacagtgccc aggctactac 
5641 cagtcacacc tagacctgga ggatctgcaa tgactggaac ttgccggtgc ctggggtgcc 
5701 tttcccccag ccagggtcca aagaagcttg gctggggcag aaataaacca tattggtcgg 
5761 aaaaaggaag gagaatacaa cgtccagcaa cagtgcccag gctactacca gtccccccta 
5821 gacctggagg atttgcaatg actggaactt gccggtgcct ggggtgcctt tcccccagcc 
5881 agggtccaaa aaagcttggc tggggcaaaa ataaaccata ttggtcggaa aaaaaaaaaa 
594 1 aaaaaaaaaa aaaaaaaaaa aaaaa 
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ATGTTCAAGAACACCAGTGTCGGCCITCTGTACTCTGGCTGCAGACTGACCTTGCTCA 

GGCCTGAGAAGAATGGGGCAGCCACTGGAATGGATGCCATCTGCAGCCACCGTCTTG 

ACCCCAAAAGCCCTGGACTCAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGA 

CCCATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCA 

ATGGTTTCACCCATCGGAGCTCTGTGGCCCCCACCAGCACTCCTGGGACCTCCACAGT 

GGACCTTGGGACCTCAGGGACrCCATCXn'CCCrCCCCAGCarCACAACAGCTGTTCCT 

CTCCTGGTGCCGTTCACCCTCAACTTTACCATCACCAATCTGCAGTATGGGGAGGACA 

TGCGTCACCCTGGCTCCAGGAAGTTCAACACCACAGAGAGGGTCCTGCAGGGTCTGCT 

TGGTCCCTTGTTCAAGAACTCCAGTGTCGGCCCTCTGTACTCTGGCTGCAGACTGATCT 

CTCTCAGGTCTGAGAAGGATGGGGCAGCCACTGGAGTGGATGCCATCTGCACCCACC 

ACCTTAACCCTCAAAGCCCTGGACTGGACAGGGAGCAGCTGTACTGGCAGCTGAGCC 

AGATGACCAATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACCGGAACAGTCTCT 

ACGTCAATGGTTTCACCCATCGGAGCrCTGGGCTCACCACCAGCACrCCITGGACrTC 

CACAGTTGACCTTGGAACCTCAGGGACTCCATCCCCCGTCCCCAGCCCCACAACTGCT 

GGCCCTCTCCTGGTGCCATTCACCCTAAACTTCACCATCACCAACCTGCAGTATGAGG 

AGGACATGCATCGCCCTGGATCTAGGAAGTTCAACGCCACAGAGAGGGTCCTGCAGG 

GTCTGCTTAGTCCCATATTCAAGAACTCCAGTGTTGGCCCTCTGTACTCTGGCTGCAG 

ACTGACCTCTCTCAGGCCCGAGAAGGATGGGGCAGCAACTGGAATGGATGCTGTCTG 

CCTCrACCACCCTAATCCCAAAAGACCTGGGCTGGACAGAGAGCAGCTGTACTGGGA 

GCTAAGCCAGCrGACCCACAACATCACTGAGCTGGGCCCCTACAGCCTGGACAGGGA 

CAGTCTCTATGTCAATGGTTTCACCCATCAGAACTCTGTGCCCACCACCAGTACTCCT 

GGGACCTCCACAGTGTACTGGGCAACCACTGGGACTCCATCCTCCTTCCCCGGCCACA 

CAGAGCCTGGCCCrCrCCTGATACCATTCACirrCAACTITACCATCACXA^ 

TATGAGGAAAACATGCAACACCCTGGTTCCAGGAAGTTCAACACCACGGAGAGGGTT 

CTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAACACCAGTGTTGGCCCTCTGTACTCTG 

GCTGCAGACTGACCITGCTCAGACCTGAGAAGCAGGAGGCAGCCACTGGAGTGGACA 

CCATCTGTACCCACCGCGTTGATCCCATCGGACCTGGACTGGACAGAGAGCGGCTATA 

CTGGGAGCTGAGCCAGCTGACCAACAGCATCACAGAGCTGGGACCCTACACCCTGGA 

TAGGGACAGTCTCTATGTCAATGGCTTCAACCCTTGGAGCTCTGTGCCAACCACCAGC 

ACTCCTGGGACCTCCACAGTGCACCTGGCAACCTCTGGGACTCCATCCTCCCTGCCTG 

GCCACACAGCCCCTGTCX:CTCT(J[TGATACCATTCACCCTCAACITTACCATCACCAAC 

CTGCATTATGAAGAAAACATGCAACACCCTGGTTCCAGGAAGTTCAACACCACGGAG 

AGGGTTCrGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGCGTTGGCCCTCTGT 

ACrCTGGCTGCAGACTGACCTTGCTCAGACCTGAGAAACATGGGGCAGCCACTGGAG 

TGGACGCCATCTGCACCCTCCGCCTTGATCCCACTGGTCCTGGACTGGACAGAGAGCG 

GCTATACTGGGAGCTGAGCCAGCTGACCAACAGCGTTACAGAGCTGGGCCCCTACAC 

CCTGGACAGGGACAGTCTCTATGTCAATGGCrTCACCCATCGGAGCTCTGTGCCAACC 

ACCAGTATTCCTGGGACCTCTGCAGTGCACCTGGAAACCTCTGGGACTCCAGCCTCCC 

TCCCTGGCCACACAGCCCCTGGCCCTCTCCTGGTGCCATTCACCCTCAACTTCACTATC 

ACCAACCTGCAGTATGAGGAGGACATGCGTCACCCTGGTTCCAGGAAGTTCAACACC 

ACGGAGAGAGTCCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGTGTTG 
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GCCCTCTGTACTCTGGCTGCAGACTGACCTTGCTCAGGCCTGAAAAACGTGGGGCAGC 

CACCGGCGTGGACACCATCTGCACTCACCGCCTTGACCCTCTAAACCCTGGACTGGAC 

AGAGAGCAGCTATACTGGGAGCTGAGCAAACTGACCCGTGGCATCATCGAGCTGGGC 

CCCTACCrCCTGGACAGAGGCAGTCTCTATGTCAATGGTTTCACCCATCGGAACTTTG 

TGCCCATCACCAGCACTCCTGGGACCrCCACAGTACACCTAGGAACCTCTGAAACTCC 

ATCCTCCCTACCTAGACCCATAGTGCCTGGCCCTCTCCTGGTGCCATTCACCCTCAACT 

TCA(XATCACCAACTTX3CAGTATGAGGAGGCCATGCGACACCCTGGCTCCAGGAAGTT 

CAATACCACGGAGAGGGTCCTACAGGGTCTGCTCAGGCCCTTGTTCAAGAATACCAGT 

ATCGGCCCTCTGTACTCCAGCTGCAGACTGACCTTGCTCAGGCCAGAGAAGGACAAG 

GCAGCCACCAGAGTGGATGCCATCTGTACCCACCACCCTGACCCTCAAAGCCCTGGAC 

TGAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGACCCACGGCATCACTGAGC 

TGGGCCCCTACACCCTGGACAGGGACAGTCTCTATGTCGATGGTrTCACTCATTGGAG 

CCCCATACCAACCACCAGCACTCCTGGGACCTCCATAGTGAACCTGGGAACCTCTGGG 

ATCCCACCTTCCCTCCCTGAAACTACAGCCACCGGCCCTCTCCTGGTGCCATTCACACT 

CAACTTCACCATCACTAACCTACAGTATGAGGAGAACATGGGTCACCCTGGCTCCAGG 

AAGTTCAACATCACGGAGAGTGTTCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCA 

CCAGTGTTGGCCCTCrGTATTCTGGCTGCAGACTGACCTTGCTCAGGCCTGAGAAGGA 

CGGAGTAGCCACCAGAGTGGACGCCATCTGCACCCACCGCCCTGACCCCAAAATCCCT 

GGGCTAGACAGACAGCAGCTATACTGGGAGCTGAGCCAGCTGACCCACAGCATCACT 

GAGCTGGGACCCTACACCCTGGATAGGGACAGTCTCTATGTCAATGGTTTCACCCAGC 

GGAGCTCTGTGCCCACCACCAGCAGTGAGTATTCTACTGATGTTCCCATGGCCCCAAT 

CTTACAACAAACTTAGCAGGAGCTGACCCCTATTCATAAGCCCTTATGTCCTTTCCAT 

AAGGGAAGGAACATAGAGGACACAAATTATTCCCCTTCCCCACTGCCCCAGCTAATC 

AGAGTC<X:AGCrGAAGCCCCACAGGCAAAAATC<XX:ATGAATAGTCCCTCCTGCTGGC 

ATTAa>m"CCATGAGAGCAC>nTGCTCCTTTCACTGTTGAGGGCTTCTCCTGAGCTCCT 

GGGACTTTCACAGTACAGCCGGAAACCTCTGAGACTCCATCATCCCrCCCrGGCCCCA 

CAGGTAAATACCAGTCAATGGTATTTGGAGCATGGTTGATGAGTGTAAACATCTCTGT 

TTATACTCTGTTAGAGCATGGTTGATGAGTGTAAACATCTCTGTCATTATTCACTCAAC 

TAAAGATGGAAATTCATAGTAAATGTAGTAACCATAGGTCAACCAACCCAGTTCATT 

GAGCACTGCCrCTGTATCAGGACCrGGATATACATCAGGGAACAAAAAAAAAAAAAA 
AAAA 
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MFKOTSVGLLYSGCRLTLLRPEKNGAATGMDAICSHRLDPKSPGLNREQLYWELS 

QLTHGIKELGPYTLDRNSLYVNGFTHRSSVAPTSTPGTSTVDLGTSGTPSSLPSPTT 

AWLLVPFTLNFXnmQYGEDMRHPGSRKFNTTERVLQGLLGPLFKNSSVGPLYS 

GCmSUlSEKDGAATGVDAICTFfflLNPQSPGLDREQLYWQLSQMTNGIKELGPY 

TLDRNSLYVNGFraniSSGLTTSTPWTSTVDLGTSGTPSPVPSPTTAGPIiVPFTM 

TnrnQYEEDMHRPGSKBaTvTATERVLQGLI^PIFKJsfSSVGPLYSGCRLTSU^ 

AATGMDAVCLYHPNPKRPGLDRBQLYWELSQLTHNTTEmPYSUJKDSLYVNGFT 

HQNSWTTSTPGTSTVYWATTGTPSSITGHTEPGPLIIPFTFNFITT^ 

PGSKKFNTTERVLQGLLKPIJKNTSVGPLYSGCRLTIXRPEKQEAATGVD'nCTHR 

VDPIGPGU)RERLYWELSQLTNSITEmPYTLDRDSLYWGFNPWSSWTTSTPGTS 

TVHLATSGTPSSLPGHTAPWLLIPFTLNFTITNLHYEE^MQHPGSRKF^^^ 

GLLKPLFKSTSVGPLYSGCRLTLLRPEKHGAATGVDAICTLRLDPTGPGLDRERLY 

WEI^QLTNSVTELGPYTLDIU:)SLYVNGFTHRSSVPTTSIPGTSAVHLETSGTPASLP 

GHTAPGPIXWFm^FT^^lS^JQYEEDMRIIPGSRKF^^TER^^ 

LYSGCRLTLLRPEKRGAATGVDTICTHRLDPLNPGLDREQLYWELSKLTRGIIELGP 

YIJLDRGSLYWGFTHRNFWITSTPGTSTVHLGTSETPSSLPRPIWGPLLWFTI^ 

TnmQYEEAMRHPGSRKPNTTERVIX^GLIJlPUfKOT 

AATRVDAICTHHPDPQSPGIJ^QLYWELSQLTHGITELGPYTLDRDSLYVDGFTH 

WSPIPTTSTPGTSIVmGTSGIPPSIJ'ETTATGPIXWFTU^JFTIT^^^QYEE 

RKFlOTESVLQGLIJO>IJfKSTSVGPLYSGCRLTLUa»EKIXjVATRVDAICT^ 

IPGIJ)RQQLYWELSQLTHSITELGPYTLDRDSLYVNGFTQRSSVPTTSSEYSTDVPM 

APILQQT*QELTPIHKPLCPFHKGRNffiDTNYSPSPLPQLIRVPAEAPQAKIPMNSPSC 

WHYXP*EHXAPFTVEGFSSAPGTFTVQPETSETPSSIJPGPTGKYQSMVFGAWLMS 

V]^VYTLLEHG*W*TS1^OTQLKMEIHSKCSNHRSTOTVH*AIPLYQD^ 

TKKKKKX 
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[ Clone B4 

Repeat A — Repeat B — Repeat C — Repeat D — Repeat E — Repeat F 

] 

—Repeat G Repeat H 

VPMAPILQQT* 

GTTCCCATGGCCCCAATCTTACAACAAACTTAGCAGGAGCTGACCCCTATTCA 

TAAGCCCTTATGTCCTTTCCATAAGGGAAGGAACATAGAGGACACAAATTATT 

CCCCTTCCCCACTGCCCCAGCTAATCAGAGTCCCAGCTGAAGCCCCACAGGCA 

AAAATCCCCATGAATAGTCCCTCCTGCTGGCATTACNTTCCATGAGAGCACNT 

TGCTCCTTTCACTGTTGAGGGCTTCTCCTCAGCTCCTGGGACTTTCACAGTACA 

GCCGGAAACCTCTGAGACTCCATCATCCCTCCCTGGCCCCACAGGTAAATACC 

AGTCAATGGTATTTGGAGCATGGTTGATGAGTGTAAACATCTCTGTTTATACTC 

TGTTAGAGCATGGTTGATGAGTGTAAACATCTCTGTCATTATTCACTCAACTAA 

AGATGGAAATTCATAGTAAATGTAGTAACCATAGGTCAACCAACCCAGTTCAT 

TGAGCACTGCCTCTGTATCAGGACCTGGATATACATCAGGGAACAAAAAAAA 

AAAAAAAAAA 
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FIGURE 11 



CCTGTGACTTCTCTTCTCACCCCTGGCCTGGTGATAACCACAGACAGGATGGGCATAA 

GCAGAGAACCTGGA^CCAGTTCCACTTCAAATTTGAGCAGCACCTCCCATGAGAGAC 

TGACCACTTTGGAAGACACTGTAGATACAGAAGCCATGCAGCCTTCCACACACACAG 

CAGTGACCAACGTGAGGA<XrrCCATTTCTGGACATGAATCACAATCTTCTGTCCTATC 

TGACTCAGAGACACCCAAAGCCACATCTCCAATGGGTACCACCTACACCATGGGGGA 

AACGAGTGTTTCCATATCCACTTCTGACITCTTTGAGACCAGCAGAATTCAGATAGAA 

CCAACATCCTCCCTGACTTCTGGATTGAGGGAGACCAGCAGCTCTGAGAGGATCAGCT 

CAGCCACAGAGGGAAGCACTGTCCTTTCTGAAGTGCCCAGTGGTGCTACCACTGAGGT 

CTCCAGGACAGAAGTGATATCCTCTAGGGGAACATCCATGTCAGGGCCTGATCAGTTC 

ACCATATCACCAGACATCTCTACTGAAGCGATCACCAGGCTTTCTACTTCCCCCATTA 

TGACAGAATCAGCAGAAAGTGCCATCACTATTGAGACAGGTTCTCCTGGGGCTACATC 

AGAGGGTACCCTCACCTTGGACACCTCAACAACAACCTTTTGGTCAGGGACCCACTCA 

ACTGCATCTCCAGGATTITCACACrCAGAGATGACCACrCTTATGAGTAGAACTCCTG 

GAGATGTGCCATGGCCGAGCCTTCCCTCTGTGGAAGAAGCCAGCTCTGTCTCTTCCTC 

ACTGTCrTCACCTGCCATGACCTCAACTTClll-rriCTCCACATTACCAGAGAGCATCr 

CCrCCTCTCCTCATCCTGTGACTGCACTTCTCACCCTTGGCCCAGTGAAGACCACAGA 

CATGTTGCGCACAAGCTCAGAACCTGAAACCAGTTCACCTCCAAATTTGAGCAGCACC 

TCAGCTGAAATATTAGCCACGTCTGAAGTCACCAAAGATAGAGAGAAAATTCATCCC 

TCCTC AAACA CACCTGTAGTCAATGTAGGGACTGTGATTTATAAACATCTATCCCCTT 

CCTCTGTTTTGGCTGACTTAGTGACAACAAAACCCACATCTCCAATGGCTACCACCTC 

CACTCTGGGGAATACAAGTGTTTCCACATCAACTCCTGCCTTCCCAGAAACTATGATG 

ACACAGCCAACrTCCTCCCTGACTTCTGGATTAAGGGAGATCAGTACCTCTCAAGAGA 

CCAGCTCAGCAACAGAGAGAAGTGCTTCTCTTTCrGGAATGCCCACrGGTGCTACrAC 

TAAGGTCTCCAGAACAGAAGCCCTCTCCTTAGGCAGAACATCCACCCCAGGTCCTGCT 

CAATCCACAATATCACCAGAAATCTCCACGGAAACCATCACTAGAATTTCTACTCCCC 

TCACCACGACAGGATCAGCAGAAATGACCATCACCCCCAAAACAGGTCATTCTGGGG 

CATCCTCACAAGGTACCTTTACCTTGGACACATCAAGCAGAGCCTCCTGGCCAGGAAC 

TCACTCAGCTGCAACTCACAGATCTCCACACTCAGGGATGACCACTCCTATGAGCAGA 

GGTCCTGAGGATGTGTCATGGCCAAGCCGCCCATCAGTGGAAAAAACTAGCCCTCCA 

T<nTCCCTGGTGTCTITATCTGCAGTAACCrCACCTTCGCCACTTTATTCCACACCATC 

TGAGAGTAGCCACTCGTCTCCTCTCCGGGTGACITCTCTTTTCACCCCTGTCATGATGA 

AGACCACAGACATGTTGGACACAAGCTTGGAACCTGTGACCACTTCACCTCCCAGTAT 

GAATATCACCTCAGATGAGAGTCTGGCCACTTCTAAAGCCACCATGGAGACAGAGGC 

AATTCAGCTTTCAGAAAACACAGCTGTGACTCATATGGGCACCATCAGTGCTAGACAA 

GAATTCTATTCCTCTTATCCAGGCCTCCCAGAGCCATCCAA^GTGACATCTCCAATGG 

TCACCTCTTCCACCATAAAAGACATTGTTTCTACAACCATACCTGCTTCCrCTGAGATA 

ACAAGAATTGAGATGGAGTCAACATCCACCCTGACCCCCACACCAAGGGAGACCAGC 

ACCTCCCAGGAGATCCACTCAGCCACAAAGCCAAGCACTGTTCCTTACAAGGCACTCA 

CTAGTGCCACGATTGAGGACTCCATGACACAAGTCATGTCCTCTAGCAGAGGACCTAG 

CCCTGATCAGTCCACAATGTCACAAGACATATCCACTGA^siGTGATCACCA 
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GGCTCTCTACCTCCCCCATCAAGACAGAATCTACAGAAATGACCATTACCACCCAAACAGGT 

TCTCCTGGGGCTACATCAAGGGGTACCCTTACCITGGACACTTCAACAACTTTTATGTCAGG 

GACCCATTCAACTGCATCTCAAGGATTTTCACACTCACAGATGACCGCTCTTATGAGTAGAA 

CTCCTGGAGAGGTGCCATGGCTAAGCCATCCCTCTGTGGAAGAAGCCAGCTCTGCCTCTTTC 

TCACTGTCTTCACCTGTCATGACCTCATCTTCTCCCGTTTCTTCCACATTACCAGACAGCATC 

CACTCTTCTTCGCTTCCTGTGACATCACTTCTCACCTCAGGGCTGGTGAAGACCACAGAGCTG 

TTGGGCACAAGCTCAGAACCTGAAACCAGTTCACCCCCAAATTTGAGCAGCACCTCAGCTGA 

AATACTGGCCACCACTGAAGTCACTACAGATACAGAGAAACTGGAGATGACCAATGTGGTA 

ACCTCAGGTTATACACATGAATCrCCTTCCTCTGTCCTAGCTGACTCAGTGACAACAAAGGC 

CACATCTTCAATGGGTATCACCTACCCCACAGGAGATACAAATGTTCTCACATCAACCCCTG 

CXnTCTCTGACACCAGTAGGATTCAAACAAAGTCAAAGCrCTCACTGACTCCTGGGTTGATG 

GAGACCAGCATCTCTGAAGAGACCAGCTCTGCCACAGAAAAAAGCACTGTCCTTTCTAGTGT 

GCCCACTGGTGCTACTACTGAGGTCTCCAGGACAGAAGCCATCTCTTCTAGCAGAACATCCA 

TCCCAGGCCCTGCTCAATCCACAA.TGTCATCAGACACCTCCATGGAAACCATCACTAGAATT 

TCTACCCCCCTCACAAGGAAAGAATCAACAGACATGGCCATCACCCCCAAAACAGGTCCTTC 

TGGGGCTACCTCGCAGGGTACCTTTACCTTGGACTCATCAAGCACAGCCTCCTGGCCAGGAA 

CTCACTCAGCTACAACrCAGAGATTTCCACGGTCAGTGGTGACAACTCCTATGAGCAGAGGT 

CCTGAGGATGTGTCATGGCCAAGCCCGCTGTCT GTGGA AAAAAACAGCCCTCCATCTTCCCT 

GGTATCnrCATCTTCAGTAACCTCACCTTCGCCACnTrATTCCACACCATCTGGGAGTAGCCA 

CTCCTCTCCTGTCCCTGTCAC^CTCnTTTCACCTCTATCATGATGAAGGCCACAGACATGTT 

GGATGCAAGTTTGGAACCTGAGACCACITCAGCTCCCAATATGAA TATCA CCTCAGATGAGA 

GTCTGGCCGCTTCTAAAGCCACCACGGAGACAGAGGCAATTCACGTTTTTGAAAATACAGCA 

GCGTCCCATGTGGAAACCACCAGTGCTACAGAGGAACTCTATTCCTCrTCCCCAGGCTTCTC 

AGAGCCAACAAAAGTGATATCTCCAGTGGTCACCrCTTCCTCTATAAGAGACAACATGGTTT 

CCACAACAATGCCTGGCTCCTCTGGCATTACAAGGATTGAGATAGAGTCAATGTCATCTCTG 

ACCCCTGGACTGAGGGAGACCAGAACCTCCCAGGACATCACCTCATCCACAGAGACAAGCA 

CTGTCCITTACAAGATGCCCTCTGGTGCCACTCCTGAGGTCTCCAGGACAGAAGTTATGCCC 

TCTAGCAGAACATCCATTCCTGGCCCTGCTCAGTCCACAATGTCACTAGACATCTCCGATGA 

AGTTGTCACCAGGCTGTCTACCTCTCCCATCATGACAGAATCTGCAGAAATAACCATCACCA 

CCCAAACAGGTTATTCTCTGGCTACATCCCAGGTTACCCTTCCCrTGGGCACCTCAATGACXirr 

TTTTGTCAGGGACCCACrCAACTATGTCTCAAGGACTTTCACACTCAGAGATGACCAATC^ 

ATGAGCAGGGGTCCTGAAAGTCTGTCATGGACGAGCCCTCGCITTGTGGAAACAACTAGATC 

TTCCTCTTCTCTGACATCATTACCTCTCACGACCTCACITr^ 

GACAGTAGCCCCTCCrCTCCTCTTCCTGTGACTTCACITATCCTCCCAGGCCTGGTGAAGACT 

ACAGAAGTGTTGGATACAAGCTCAGAGCCTAAAACCAGTTCATCTCCAAATTTGAGCAGCAC 

CTCAGTTGAAATACCGGCCACCTCTGAAATCATGACAGATACAGAGAAAATTCATCCTTCCT 

CAAACACAGCGGTGGCCAAAGTGAGGACCTCCAGTTCTGTTCATGAATCTCATTCXITCTGTC 

CTAGCTGACTCAGAAACAACCATA 
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ACCATACCTTCAATGGGTATCACCTCCGCTGTGGAGGATACCACTGTTTTCACATCAAATCC 

TGCCnTCTCTGAGACTAGGAGGATTCCGACAGAGCCAACATTCTCATTGACTCCTGGATTCA 

GGGAGACTAGCACCTCTGAAGAGACCACCTCAATCACAGAAACAAGTGCAGTCCTTTTTGG 

AGTGCCCACTAGTGCTACTACTGAAGTCTCCATGACAGAAATAATGTCCTCTAATAGAACAC 

ACATCCCTGACTCTGATCAGTCCACGATGTCTCCAGACATCATCACTGAAGTGATCACCAGG 

CTCTCTTCCTCATCCATGATGTCAGAATCAACACAAATGACCATCACCACCCAAAAAAGTTC 

TCCTGGGGCTACAGCACAGAGTACTCTTACCnTGGCCACAACAACAGCCCCCrTGGCAAGGA 

CCCACTCAACTGTTCCTCCTAGATTTTTACACTCAGAGATGACAACTCTTATGAGTAGGAGT 

CCTGAAAATCCATCATGGAAGAGCTCTCCCriTrGTGGAAAA(\ACTAGCTCTrCAT(^ 

GTTGTCCTTACCTGTCACGACCTCACCrrcrGTTTCITCCACATTACCGCAGAGTATCCCTTC 

CTCCTCTri-lTCTGTGACTTCACTCCTCACCCCAGGCATGGTGAAGACTACAGACACAAGCA 

CAGAACCTGGAACCAGTTTATCTCCAA(\TCTGAGTGGCACCTCAGTTGAAATACTGGCTGCC 

TCTGAAGTCACCACAGATACAGAGAAAATTCATCCTTCTTCAAGCATGGCAGTGACCAATGT 

GGGAACCACCAGTrCTGGACATGAACTATATTCCTCTGTTTCAATCCACTCGGAGCCATCCA 

AGGCTACATACCCAGTGGGTACrCCCTCTTCCATGGCrGAAACCTCTATTTCCACATCAATGC 

CTGCTAATTTTGAGACCACAGGATTTGAGGCTGAGCCATTTTCTCATTTGACT^ 

GGAAGACX^AACATGTCCCTGGACACCAGCTCAGTCACACCAACAAATACACCTTCTTCTCCT 

GGGTCCACTCACCTTTTACAGAGTTCCAAGACTGATTTCACCrCTTCTGCAAAAACATCATCC 

CCAGACTGGCCTCCAGCCTCACAGTATACTGAAATTCCAGTGGACATAATCACCCCCrrrAA 

TGCTTCTCCATCTATrACGGAGTCCACTGGGATAACCTCCITCCCAGAATCCAGGTTTACTAT 

GTCTOTAACAGAAAGTACTCATCATCTGAGTACAGATTTGCTGCCITCAGCrGAGACrATTT 

CCACTGGCACAGTGATGCCTTCTCTATCAGAGGCCATGACTTCATTTGCCACCACTGGAGTT 

CCACGAGCCATCTCAGGTTCAGGTAGTCCATTCTCTAGGACAGAGTCAGGCCCTCGGGATGC 

TACTCTGTCCACCATTGCAGAGAGCCTGCCTTCATCCACTCCTGTGCCATTCTCCTCTTCAAC 

CTTCACTACCACTGATTCTTCAACCATCCCAGCCCrCCATGAGATAACTTCCT(nTCAGCTAC 

CCCATATAGAGTGGACACCAGTCTTGGGACAGAGAGCAGCACTACTGAAGGACGCTTGGTT 

ATGGGGACAGAGAGCAGCACTACTGAAGGACGCTTGGTTATGGTCAGTACTTTGGACACTTC 

AAGCCAA(^AGGCAGGACATCTrCATCACCCATTTTGGATACCAGAATGACAGAGAGCGTT 

GAGCTGGGAACAGTGACAAGTGCTTATCAAGTTCCTTCACTCTCAACACGGTTGACAAGAAC 

TGATGGCATTATGGAACACATCACAAAAATACCCAATGAAGCAGCACACAGAGGTACCATA 

AGACCAGTCAAAGGCCCTCAGACATa:ACrTCGCCTGCCAGTCCTAAAGGACTACACACAG 

GAGGGACAAAAAGAATGGAGACCACCACCACAGCTCTGAAGACCACCACCACAGCTCTGAA 

GACCACTTCCAGAGCCACCTTGACCACCAGTGTCTATACTCCCACTTTGGGAACACTGACTC 

CCCTC AATG CATCAATGCAAATGGCCAGCACAATCCCCACAGAAATGATGATCACAACCCC 

ATATGTTTTCCCTGATGTTCCAGAAACGACATCCTCATTGGCTACCAGCCTGGGAGCAGAAA 

CCAGCACAGCTCTTCCCAGGACAACCCCATCTGTTTTCAATAGAGAATCAGAGACCACAGCC 

TCACTGGTCTCTCGTTCTGGGGCAGAGAGAAGTCCGGTTATTCAAACTCTAGATGTTTCTTCr 

AGTGAGCCAGATACAACAGCTTCATGGGTTAT 
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CCATCCTGCAGAGACCATCCCAACTGTTTCCAAGACAACCCCCAATTTTTTCCACAGTGAAT 

TAGACACTGTATCTTCCACAGCCACCAGTCATGGGGCAGACGTCAGCTCAGCCATTCCAACA 

AATATCTCACCTAGTGAACTAGATGCACTGACCCCACTGGTCACTATTTCGGGGACAGATAC 

TAGTACAACATTCCCAACACTGACTAAGTCCCCACATGAAACAGAGACAAGAACCACATGG 

CTCACTCATCCTGCAGAGACCAGCTCAACTATTCCCAGAACAATCCCCAATTTTTCTCATCAT 

GAATCAGATGCXJACACCrTCAATAGCCACCAGTCCTGGGGCAGAAACCAGTrCAGCTATTCC 

AATTATGACrGTCTCACCTGGTGCAGAAGATCTGGTGACCTCACAGGTCACTAGTTCTGGCA 

CAGACAGAAATATGACTATTCCAACTITGACTCTTTCTCCTGGTGAACCAAAGACCATAGCC 

TCATTAGTCACCCATCCrGAAGCACAGACAAGTTCGGCCATrCCAACTTCAACTATCTCGCC 

TGCTGTATCACGGTTGGTGACCTCAATGGTCACCAGTTTGGCGGCAAAGACAAGTACAACTA 

ATCGAGCTCTGACAAACTCCCCTGGTGAACCAGCTACAACAGTTTCATTGGTCACGCATTCT 

GCACAGACCAGCCCAACAGTTCCCTGGACAACTTCCATTTTTTTCCATAGTAAATCAGACAC 

CACACCirCAATGACCACCAGTCATGGGGCAGAATCCAGTTCAGCTGTTCCAACTCCAACTG 

TTTCAACTGAGGTACCAGGAGTAGTGACCCCTTTGGTCACCAGTTCTAGGGCAGTGATCAGT 

ACAACrATTCCAATTCTGACTCTTTCTCCTGGTGAACCAGAGACCACACCTTCAATGGCCAC 

CAGTCATGGGGAAGAAGCCAGTTCTGCTATTCCAACTCCAACTGTTTCACCTGGGGTACCAG 

GAGTGGTGACCTCTCTGGTCACTAGTrCTAGGGCAGTGACTAGTACAACTATTCCAATTCTG 

TGGCTCAGCTGTTCCAACrGTTTTACCTGAGGTACCAGGAATGGTGACCTCTCTGGTTGCTA 

GTTCTAGGGCAGTAACCAGTACAACTCITCCAACTCTGACTCTTrCTCCTGGTGAACC^^ 

ACCACACCTTCAATGGCCACCAGTCATGGGGCAGAAGCCAGCrCAACTGTTCCAACTGTTTC 

ACCTGAGGTACCAGGAGTGGTGACCTCTCTGGTCACTAGTTCTAGTGGAGTAAACAGTACAA 

GTATTCCAACrCTGATTCITrCTCCTGGTGAACTAGAAACCACACCIT^ 

ATGGGGCAGAAGCCAGCrCAGCTGTTCCAACTCCAACTGTTTCACCTGGGGTATCAGGAGTG 

GTGAC<XCTCTGGTCACTAGTTCCAGGGCAGTGACCAGTACAACTATTCCAATTCTAi^CTCT 

TTCTTCTAGTGAGCCAGAGACCACACCTTCAATGGCCACCAGTCATGGGGTAGAAGCCAGCT 

CAGCTGTTCTAACrGTTTCACCTGAGGTACCAGGAATGGTGACCTTTCTGGTCACT 

GAGCAGTAACCAGTACAACrATTCCAACTCTGACTATTTCTTCTGATGAACCAGAGACXJACA 

ACTTCATTGGTCACCCATTCTGAGGCAAAGATGATTTCAGCCATTCCAACTTTAGGTGTCTCC 

CCTACTGTACAAGGGCTGGTGA(nTCACTGGTCACTAGTTCTGGGTCAGAGACCAGTGCGTT 

TTCAAATCTAACTOTTGCCTCAAGTCAACCAGAGACCATAGACTCATGGGTCGCTCATCCT 

GGACAGAAGCAAGTTCTGTTGTTCCAAClTTGACTGTCrCCACTGGTGAGCCGTTTACAAAT 

ATCTCATTGGTCACCCATCCrGCAGAGAGTAGCTCAACTCTTCCCAGGACAACCTCAAGGTT 

TTCCCACAGTGAATTAGACACTATGCCITCrACAGTCACCAGTCCTGAGGCAGAATCCAGCT 

CAGCCATTTCAACAACTATTTCACCTGGTATACCAGGTGTGCTGACATCACTGGTCACTAGC 

TCTGGGAGAGACATCAGTGCAACTTTTCCAACAGTGCCTGAGTCCCCACATGAATCAGAGGC 

AACAGCCTCATGGGTTACTCATCCTGCAGTCACCAGCACAACAGTTCCCAGGACAACCCCTA 

ATTATTCTCATAGTGAACCAGACACC 
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ACACCATCAATAGCCACCAGTCCTGGGGCAGAAGCCACTTCAGATTTTCCAACAATAACTGT 

CTCACCTGATGTACCAGATATGGTAACCTCACAGGTCACTAGTTCTGGGACAGACACCAGTA 

TAACTATTCCAACTCTGACrClllClU'CTGGTGAGCCAGAGACCACAACCTCATTTATCACCT 

ATTCrGAGACACATACAAGTTCAGCCATTCCAACTCTCCCTGTCTCCCCTGATGCATCAAAG 

ATGCTGACCTCACTGGTCATCAGTTCTGGGACAGACAGCACTACAACTTTCCCAACACTGAC 

GGAGACCCCATATGAACCAGAGACAACAGCCATACAGCTCATTCATCCTGCAGAGACCAAC 

ACAATGGTTCCCAGGACAACTCCCAAGTTTTCCCATAGTAAGTCAGACACCACACTCCCAGT 

AGCCATCACCAGTCCTGGGCCAGAAGCCAGTTCAGCTGTTTCAACGACAACTATCrCACCTG 

ATATGTCAGATCTGGTGACCTCACTGGTCCCTAGTTCTGGGACAGACACCAGTACAACCTTC 

CCAACATTGAGTGAGACCCCATATGAACCAGAGACTACAGCCACGTGGCTCACTCATCCTGC 

AGAAACCAGCACAACGGTTTCrGGGACAATTCCCAACTTTTCCCATAGGGGATCAGACACTG 

CACCCTCAATGGTCACCAGTCCTGGAGTAGACACGAGGTCAGGTGTTCCAACTACAACCATC 

CCACCCAGTATACCAGGGGTAGTGACCTCACAGGTCACTAGTTCTGCAACAGACACTAGTAC 

AGCTATTCCAACnrrGACTCCm-CTCCTGGTGAACCAGAGACCACAGCCTCATCAGCTACCC 

ATCCTGGGACACAGACTGGCTTCACTGTTCCAATTCGGACTGTTCCCTCTAGTGAGCCAGAT 

AC AATGGC TTCCTGGGTCACTCATCCTCCACAGACCAGCACACCTGTTTCCAGAACAACCTC 

CAGTITTTCCCATAGTAGTCCAGATGCCACACCTGTAATGGCCACCAGTCCTAGGACAGAAG 

CCAGTTCAGCTGTACTGACAACAATCTCACCTGGTGCACCAGAGATGGTGACTTCACAGATC 

ACTAGTTCTGGGGCAGCAACCAGTACAACTGTTCCAACTTTGACTCATTCTCCTGGTATGCC 

AGAGACCACAGCCTTATTGAGCACCCATCCCAGAACAGAGACAAGTAAAACATTTCCTGCnT 

CAACTGTGTTTCCTCAAGTATCAGAGACCACAGCCTCACTCACCATTAGACCTGGTGCAGAG 

ACTAGCACAGCrCTCCCAACTCAGACAACATCCrCTCTCTTCACCCTACTTGTAACTGGAACC 

AGCAGAGTTGATCTAAGTCCAACTGCTTCACCTGGTGTTTX7rGCAAAAACAG(X:C^ 

CACCCATCCAGGGACAGAAACCAGCACAATGATTCCAACTTCAACTCirrCeCTTGGTTTAC 

TAGAGACTACAGGCTTACTGGCCACCAGCTCTTCAGCAGAGACCAGCACGAGTACTCTAACT 

CrGACTGTTTCCCCTGCTGTCTCTGGGCTTTCCAGTGCCTCTATAACAACTGATAAGCCCCAA - 

ACTGTGACCTCCTGGAACACAGAAACCTCACCATCTGTAACTTCAGTTGGACCCCCAGAATT "-^ 

TTCCAGGACTGTCACAGGCACCACTATGACCTTGATACCATCAGAGATGCCAACACCACCTA — 

AAACCAGTCATGGAGAAGGAGTGAGTCCAACCACTATCTTGAGAACTACAATGGTTGAAGC ' • ^ 

CACTAAnTAGCTACCACAGGTTCCAGTCCCACTGTGGCCAAGACAACAACCACCTTCAATA - " - 

CACTGGCTGGAAGCXn-CITTACrCCrCTGACCACACCTGGGATGTCCACCTTC^ 

AGTGTGACCTCAAGAACAAGTTATAACCATCGGTCCTGGATCTCCACCACCAGCGGTTATAA 

CCGTCGGTACTGGACCCCTGCCACCAGCACTCCAGTGACTTCTACATTCTCCCCAGGGATTTC 

CACATCCTCCATCCCCAGCrCCACAGCAGCCACAGTCCX:ATTCATGGTG(XATTCACCCrcA 

ACTTCACCATCACCAACCTGCAGTACGAGGAGGACATGCGGCACCCTGGTTCAAGGAAGTTC 

AACGCCACAGAGAGAGAACTGCAGGGTCTGCTCAAACCCTTGTTCAGGAATAGCAGTCTGG 

AATACCTCTATTCAGGCTGCAGACTAGCCTCACTCAGGCCAGAGAAGGATAGCTCAGCCACG 

GCAGTGGATGCCATCTGC 
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ACACATCGCCCTGACCCTGAAGACCTCGGACTGGACAGAGAGCGACTGTACTGGGAGCTGA 

GCAATCrGACAAATGGCATCCAGGAGCTGGGCCCTTACACCCTGGACCGGAACAGTCTCTAT 

GTCAATGGTTTCACCCATCGAAGCTCTATG<XCACCACCAGCACTCCTGGGACCTCCACAGT 

GGATGTGGGAACCTCAGGGACTCCATCCTCCAGCCCCAGCCCCACGACTGCTGGCCCTCTCC 

TGATGCCGTTCACCCTCAACTTCACCATCACCAACCTGCAGTACGAGGAGGACATGCGTCGC 

ACTGGCTCCAGGAAGTTCAACACCATGGAGAGTGTCCTGCAGGGTCTGCTCAAGCCATTGTT 

CAAGAACACCAGTGTTGGCCCTTTGTACTCTGGCTGCAGATTGACCTTGCTCAGGCCCGAGA 

AAGATGGGGCAGCCACTGGAGTGGATGCCATCTGCACCCACCGCCTTGACCCCAAAAGCCN 

TGGACTCAACAGGGAGCAGCTGTACTGGGAGCTAAGCAAACTGACCAATGACATTGAAGAG 

CTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCAATGGTTTCACCCATCAGAGCTC 

TGTGTCCGCCACCAGCACTCCTGGGACCTCCACAGTGGATCTCAGAACCTCAGGGACTCCAT 

CCTCCCTCrCCAGCCCCACAATTATGGCTGCTGGCCCrCTCCTGGTACCATTCACCCTCAACT 

TCACCATCACCAACCTGCAGTATGGGGAGGACATGGGTCACCCTGGCTCCAGGAAGTTCAAC 

ACCACAGAGAGGGTCCTGCTGGGTCrGCTTGGTCCXJATATTCAAGAACACCAGTGTTGGCC 

TCTGTACTCTGGCTGCAGACTGACCTCTCTCAGGTCCGAGAAGGATGGAGCAGCCACTGGAG 

TGGATGCCATCTGCATCCATCATCnTGACCCCAAAAGCCCTGGACTCAACAGAGAGCGGCTG 

TACTGGGAGCTGAGCCAACTGACCAATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACA 

GGAACAGTCTCTATGTCAATGGTTTCACCCATCGGACCTCTGTGCCCACCACCAGCACTCCT 

GGGACCTCCACAGTGGACCITGGAACCTCAGGGACTCCATTCTCCCTCCCAAGCCCCGCAAC 

TGCTGGCCCTCTCCTGGTGCTGTTCACCO'CAACnTCACCATCACCAACCTGAAGTATGAGG 

AGGACATGCATCGCCCTGGCTCCAGGAAGTTCAACACCACTGAGAGGGTCCTGCAGACTCTG 

CTTGGTCCTATGTTCAAGAACACCAGTGTTGGCCTTCTGTACTCTGGCTGCAGACTGACCrTG 

CTCAGGTCCGAGAAGGATGGAGCAGCCACTGGAGTGGATGCCATCTGCACCCACCGTCTTG 

ACCCCAAAAGCCCTGGAGTGGACAGGGAGCAGCTATACTGGGAGCTGAGCCAACTGACCAA 

TGGCATCAAAGAGCTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCAATGGTTTCA 

CCCATTGGATCCCTGTGCCCACCAGCAGCCCTGGGACCTCCACAGTGGACCTTGGGTCAGGG 

ACTCCATCCrCCCTCCCCAGCCCCACAAGTGCTGCTGGCCCTCTCCTGGTGCCATTCACCCTC 

AACTTCACCATCACCAACCTGCAGTACGAGGAGGACATGCATCACCCAGGCTCCAGGAAGT 

TCAACACCACGGAGCGGGTCCTGCAGACTCTGGTTGGTCCTATGTTCAAGAACACCAGTGTT 

GGC(nTCTGTACrCTGGCTGCAGACTGACCTTGCTCAGGTCCGAGAAGGATGGAGCAGCCAC 

TGGAGTGGATGCCATCTGCACX^CACCGTCTTGACCCXAAAAGCCCTGGAG^ 

CAGCTATACTGGGAGCTGAGCCAGCTGACCAATGGCATCAAAGAGCTGGGCCCCTACACCC 

TGGACAGGAACAGTCTCTATGTCAATGGTTTCACCCATTGGATCCCTGTGCCCACCAGCAGC 

ACTCCTGGGACCTCCACAGTGGACCTTGGGTCAGGGACTCCATCCTCCCTCCCCAGCCCCAC 

AACTGCTGGCCCTCTCCTGGTGCCGTTCACCCTCAACTTCACCATCACCAACCTGAAGTACG 

AGGAGGACATGCATTGCCCTGGCTCCAGGAAGTTCAACACCACAGAGAGAGTCCTGCAGAG 

TCTGCTTGGTCCCATGTTCAAGAACACCAGTGTTGGCCCTCTGTACTCrGGCTGCAGACTGA 

CCTTGCTCAGGTCCGAGAAGGATGGAGCAGCCACTGGAGTGGATGCCATCT 
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GCACCCACCGTCTTGACCCCAAAAGCCCTGGAGTGGACAGGGAGCAGCTATACTGGGAGCT 

GAGCCAGCTGACCAATGGCATCAAAGAGCTGGGTCCCTACACCCTGGACAGAAACAGTCTC 

TATGTCAATGGTTTCACCCATCAGACCTCTGCGCCCAACACCAGCACTCCTGGGACCTCCAC 

AGTGGACCTTGGGACCTCAGGGACTCCATCCTCCCTCCCCAGCCCTACATCTGCTGGCCCTCT 

CCTGGTGCCATTCACCCTCAACTTCACCATCACCAACCTGCAGTACGAGGAGGACATGCATC 

ACCCAGGCTCCAGGAAGTTCAACACCACGGAGCGGGTCCTGCAGGGTCTGCTTGGTCCCATG 

TTCAAGAACACCAGTGTCGGCCTTCTGTACTCrGGCTGCAGACTGACCTTGCTCAGGCCTGA 

GAAGAATGGGGCAGCCACTGGAATGGATGCCATCTGCAGCCACCGTCTTGACCCCAAAAGC 

CCTGGACTCAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGACCCATGGCATCAAAG 

AGCTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCAATGGTTTCACCCATCGGAGC 

TCTGTGGCCCCCACCAGCACTCCTGGGACCTCCACAGTGGACCTTGGGACCTCAGGGACTCC 

ATCCTCCCTCCCCAGCCCCACAACAGCTGTTCCTCTCCTGGTGCCGTTCACCCTCAACTTTAC 

CATCACCAATCTGCAGTATGGGGAGGACATGCGTCACCCTGGCTCCAGGAAGTTCAACACCA 

CAGAGAGGGTCCTGCAGGGTCTGCTTGGTCCCITGTTCAAGAACTCCAGTGTCGGCCCTCTG 

TACTCTGGCTGCAGACTGATCTCTCTCAGGTCTGAGAAGGATGGGGCAGCCACTGGAGTGGA 

TGCCATCTGCACCCACCACCnTAACCCTCAAAGCCCTGGACTGGACAGGGAGCAGCTGTACT 

GGCAGCTGAGCCAGATGACCAATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACCGGAA 

CAGTCTCTACGTCAATGGTTTCACCCATCGGAGCTCTGGGCTCACCACCAGCACTCCTTGGA 

CITCCACAGTTGAC(nTGGAACCTCAGGGACTCCATCCCCCGTCCCCAGCCCCACAACTGCT 

GGCCCTCTCCTGGTGCCATTCACCCTAAACTTCACCATCACCAACCTGCAGTATGAGGAGGA 

CATGCATCGCCCTGGATCTAGGAAGTTCAACGCCACAGAGAGGGTCCTGCAGGGTCTGCTTA 

gtcccatatrcaagaactccagtgttggccctctgtactctggctgcagactgacctctctca 

ggcccgagaaggatggggcagcaactggaatggatgctgtctgcctctacx:accctaa'rccc 

aaaagacctgggctggacagagagcagctgtactgggagctaagccagctgacccacaaca 

tcactgagctgggcccctacagcctggacagggacagtctctatgtcaatggtttcacccat 

cagaactctgtgcccaccaccagtactcctgggacctccacagtgtactgggcaaccactgg 

gacrccatcctccttccccggccacacagagcctggccctctcctgataccattcactttcaa 

ctttaccatcaccaacctgcattatgaggaaaacatgcaacaccctggttccaggaagttca 

acaccacggagagggttctgcagggtctgctcaagcccttgttcaagaacaccagtgttggc 

cctctgtactctggctgcagactgaccttgctcagacctgagaagcaggaggcagccactgg 

agtggacacx:atctgtacccaccgcgttgatc<xatcggacctggactggacagagagcgg 

ctatactgggagctgagccagctgaccaacagcatcacagagctgggaccctacaccctgg 

atagggacagtctctatgtcaatgg(ntcaacccrtggagctctgtgccaaccaccagcact 

cctgggacctccacagtgca<xrixkk:aacctctgggactccatccrccctgccrg<^ 

agcccctgtccctctcttgataccattcaccctcaactttaccatcaccaacctgcattatga 

agaaaacatgcaacaccctggttccaggaagttcaacaccacggagagggttctgcagggt 

ctgcrcaagcccttgttcaagagcaccagcgttggccctctgtacrctggctgcagactgac 
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CTTGCTCAGACCTGAGAAACATGGGGCAGCCACTGGAGTGGACGCCATCTGCACCCTCCGCCT 

TGATCCCACTGGTCCTGGACTGGACAGAGAGCGGCTATACTGGGAGCTGAGCCAGCTGACCAA 

CAGCGTTACAGAGCTGGGCCCCTACACCCTGGACAGGGACAGTCTCTATGTCAATGGCTTCAC 

CCATCGGAGCTCTGTGCCAACCACCAGTATTCCTGGGACCTCTGCAGTGCACCTGGAAACCTC 

TGGGACTCCAGCCTCCCTCCCTGGCCACACAGCCCCTGGCCCTCTCCTGGTGCCATTCACCCTC 

AA(nTCACrATCACCAACCTGCAGTATGAGGAGGACATGCGTCACCCTGGTTCCAGGAAGTTC 

AACACCACGGAGAGAGTCCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGTGTTGGC 

CCrCTGTACTCTGGCTGCAGACTGACCTTGCTCAGGCCTGAAAAACGTGGGGCAGCCACCGGC 

GTGGACACCATCTGCACrCACCGCCTTGACCCrCTAAACCCTGGACTGGACAGAGAGCAGCTA 

TACTGGGAGCrGAGCAAACTGACCCGTGGCATCATCGAGCTGGGCCCCTACCTCCTGGACAGA 

GGCAGTCTCTATGTCAATGGTITCACCCATCGGAACITTGTGCCCATCACCAGCACTCCrGGGA 

CCTCCACAGTACACCTAGGAACCTCTGAAACTCCATCCTCCCTACCTAGACCCATAGTGCCTG 

GCCCTCTCCTGGTGCCATTCACCCTCAACTTCACCATCACCAACTTGCAGTATGAGGAGGCCAT 

GCGACACCCTGGCTCCAGGAAGTTCAATACCACGGAGAGGGTCCTACAGGGTCTGCTCAGGCC 

CnTGTTCAAGAATACCAGTATCGGCCCTCTGTACTCCAGCTGCAGACTGACCTTGCTCAGGCCA 

GAGAAGGACAAGGCAGCCACCAGAGTGGATGCCATCTGTACCCACCACCCTGACCCTCAAAG 

CCCrGGACTGAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGACCCACGGCATCACrG 

AGCTGGGCCCCTACACCCTGGACAGGGACAGTCTCTATPTCGATGGTTTCACTCATTGGAGCC 

ccataccaaccaccagcactcctgggacctccatagtgaacctgggaacctctgggatcccac 

cttcccrcccrgaaactacagccaccggccctctcctggtgccattcacactcaacttcaccat 

cactaacctacagtatgaggagaacatgggtcaccctggctccaggaagttcaacatcacgga 

gagtgttctgcagggtctgcrcaagcccttgttcaagagcaccagtgttggccctcrgtattct 

ggctgcagactgaccttgctcaggcctgagaaggacggagtagccaccagagtggacgccat 

ctgcacccaccgccctgacccx:aaaatccctgggctagacagacagcagctatactgggagct 

GAGCCAGCTGACCCACAGCATCACTGAGCrGGGACCCTACACCCTGGATAGGGACAGTCTCTA 

tgtcaatggtttcacccagcggagctctgtgcccaccaccagcagtgagtattctactgatgtt 
cccatggccccaatcttacaacaaacttagcaggagctgacccctattcataagcccttatgt 

CCITTCCATAAGGGAAGGAACATAGAGGACACAAATTATTC(XCITCCCCACTGCCCCA<^ 

ATCAGAGTCCCAGCTGAAGCCCCACAGGCAAAAATCCCCATGAATAGTCCCTCCTGCTGGCAT 

TACNTTCCATGAGAGCAChm'GCrCCTTTCACTGTTGAGGGCITCTCCTCAGCTCCTGGGACT^ 

TCACAGTACAGCCGGAAACCTCTGAGACTCCATCATCCCTCCCTGGCCCCACAGGTAAATACC 

AGTCAATGGTATTTGGAGCATGGTTGATGAGTGTAAACATCTCTGTITATACTCTGTrAGAGC 

ATGGTTGATGAGTGTAAACATCTCTGTCATTATTCACTCAACTAAAGATGGAAATTCATAGTA 

AATGTAGTAACCATAGGTCAACCAACCCAGTTCATTGAGCACTGCCTCTGTATCAGGACCTGG 

ATATACATCAGGGAACAAAAAAAAAAAAAAAAAA 
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FIGURE 12 

MGTTYTMGETSVSISTSDFFETSiaQffiPTSSLTSGLiyETSSSERISSATBGSTVLSEVPSGATTEVSR 

TEVISSRGTSMSGPDQFTISPDISTEAITM^TSPIMTESAESAimTGSPGATSEGTLTLDTSTTTFW 

SGTHSTASPGFSHSEMTTLMSRTPGDVPWPSLPSVEEASSVSSSLSSPAMTSTSFFSTLPESISSSPH 

PVTALLTLGPVKTTDMLRTSSEPETSSPPNLSSTSAEILATSEVTKDREKIHPSSNTPVVNVGTVIY 

KHLSPSSVLADLVTTKPTSPMATTSTLGhrrSVSTSTPAFPETMMTQPTSSLTSGLRmSTSQ 

TERSASLSGMPTGATTKVSRTEAI^LGRTSTPGPAQSTISPEISTETrrRISTPLTTTGSAEMmPKT 

GHSGASSQGTFTLDTSSRASWPGTHSAATORSPHSGMTTPMSRGPEDVSWPSRPSVEKTSPPSSL 

VSLSAVTSPSPLYSTPSESSHSSPLRVTSLFIPVMMKTTDMIJDTSLEPVTTSPPSMNrrSDESLATS 

KATMETEAIQLSENTAVTHMGTISARQEFYSSYPGIJ»EPSKVTSPMVTSSTIBa)IVSTTIP^ 

lEMESTSTLTPTPRETSTSQEmSATKPSTWYKALTSATffiDSMTQVMSSSRGPSPDQSTMSQDIST 

EVrim^TSPIKTESTEMTriTQTGSPGATSRGTLTLDTSTTFMSGTEJSTASQGFSHSQMTALMSRT 

PGEWWLSHPSVEEASSASFSLSSPVMTSSSPVSSTLPDSIHSSSIPVTSLLTSGLVKTTELLGTSSE 

PETSSPPl^SSTSAEII^TTEVTTDTEKLEMTNVWSGYTHESPSSVlJyDSVTTKATSSM 

DTbTV^TSTPAFSDTSWQTKSKI^LTPGLMETSISEETSSATEKSTVLSSVFTGATTEVSRTEAISSS 

RTSIPGPAQSTMSSDTSMETITRISTPLTRKESTDlVLAriPKTGPSGATSQGTFTLDSSSTASWPGTH 

SATTQRFPRSWTTPMSRGPEDVSWPSPLSVEKNSPPSSLVSSSSVTSPSPLYSTPSGSSHSSPVPVT 

SlJTSIMMKATDMLDASLEPETreAPNMhnTSDESLAASKATTETEAI^^ 

ELYSSSPGFSEPTKVISPVWSSSIRDhMVSTTMPGSSGITRIEffiSMSSLTPGmETRTSQDITSSTET 

STVLY^□^^PSGATPEVSRlEVMPSSRTSIPGPAQSTMSLDISDEWTRLSTSPI^^•ESAEmTTQT^ 

SIATSQVTLPLGTSMTFEilGTHSmSQGI^HSEMTNLMSRGPESLSWTSPRrvnET^ 

LTTSLSPVSSTLLDSSPSSPLPWSLILPGLVKTTEVLDTSSEPKTSSSPNLSSTSVEIPATSEIMTDTE 

KJHPSSNTAVAKWTSSSVHESHSSVLADSETTITIPSMGITSAVEDTTVFrSNPAFSETRRIPra 

SLTPGFRETSTSEETTSITETSAVLFGVPTSATTEVSMTEIMSSNRTHIPDSDQSTMSPDIITEVr^ 

SSSSMMSESTQMTITTQKSSPGATAQSTLTLATTTAPLARTHSTWPRFLHSEMTTLMSRSPENPS 

WKSSPFVEKTSSSSSLLSLPVTTSPSVSSTLPQSIPSSSFSVTSLLTPGMVKTTDTSiEPGTSLSPNLS 

GTSVEILAASEVTTDTEKIHPSSSMAVTNVGTTSSGHELYSSVSfflSEPSKATYPVGTPSS 

STSMPANFETTGFEAEPFSHLTSGLRKTNMSIJ:)TSS\nnPTNTPSSPGSTHLLQSSKTO^ 

PDWPPASQYTEIPVDIITPFNASPSITESTGITSFPESRFIMSVTESTHHI^TDLLPSAETISTGTVMP 

SLSEAMTSFATTGWRAISGSGSPFSRTESGPGDATLSTIAESLPSSTPVPFSSSTFTTTDSSTIPALH 

EirSSSATPYRVDTSLGTESSTTEGRLVMGTESSTTEGRLVMVSTLDTSSQPGRTSSSPDLDTRMTE 

SVELGTVTSAYQWSLSTRLTRTDGIMEHITKIPNEAAHRGTIRPVKGPQTSTSPASPm 

KRMETTTTALKTTTTALKTTSRATLTTSVYTPTLGTLTPLNASMQMASTIPTEMI^^ 

PETTSSLATSLGAETSTALPRTTPSVF>niESETTASLVSRSGAERSPVIQTLDVSSSEPDTTASWVI 

HPAETIPTVSKTTPNFFHSELDWSSTATSHGADVSSAIPTNISPSELDALTPLVTOGTJJreTTFP^ 

TBKPHETETRTTWLTHPAETSSTIPRTIPNFSHIffiSDATPSIATSPGAETSSAIPIMWSPGAEDLW^ 

QVTSSGTDRNMTirn.TI^PGEPKTIASLVTHPEAQTSSAIPTSTIS 
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PAVSRLWSMVTSLAAKTSTT^«lALTNSPGEPATWSLVTHSAQTSPTWWT^SIFFHSKSDTTPS 

MTTSHGAESSSAVPTPTVSTEWGVVTPLVTSSRAVISTriPILTLSPGEFETTPSMATSHGEEASSA 

IP1TTVSPGWGVVTSLVTSSRA.WSTTIPILTFSLGEPETTPSMATSHGTEA 

VTSLVASSRAVTSTTLPTLTLSPGEPETTPSMATSHGAEASSTVPTVSPEVPGVVTSLVTSSSGVN 

STSIPTULSFGELETIPSMATSHGAEASSAVPTPTVSPGVSGVVTPLVTSSRAVTSTTIPILTLSSSE 

PEITPSMATSHGVEASSAVLWSPEWGMVTFLWSSRAVTSTTIPTLTISSDEPETTTSLVTHSEA 

KMISAIPTLGVSPTVQGLVTSLVTSSGSETSAFSNLTVASSQPETIDSWAHPGTEASSVWTLTVS 

TGEPFIT«SLVTEffAESSSTLPRTTSRFSHSELDTMPSTVTSPEAESSSAISTTISPGIPGVLTSLVTSS 

GRDISATFPTVPESPHESEATASWVTHPAVTSTTVTRTTPNYSHSEPDTrPSIATSPGAEATSDFPTI 

TVSPDWDMVTSQWSSGTDTSITIPTLTIJSSGEPETITSFITYSE1OTSSAIP11.PVS 

VISSGTDSTTTFPTLTETPYH»ETTAIQLIHPAETlsriMVPRTrPKFSHSKSDT^^ 

VSTTmPDMSDLVTSLWSSGTDTSTTFPTLSETPYEPETTATWLTHPAETSTTVSGT^^ 

DTAPSMVTSPGVDTRSGWTTTIPPSIPGWTSQWSSATDTSTAIPTLTPSPGEPET^^ 

QTGFTWIRTVPSSEFDTMASWVTHPFQTSTPVSRTTSSFSHSSPDATPVMATSFRTEASSAVLTTI 

SPGAPEMVTSQITSSGAATSTTWTLTHSPGMFETTALI^THPRTETSKTFPASTVFPQVS 

TIRPGAETSTALFrQTrSSLFTLLVTGTSRVDLSFTASPGVSAKTAPLSTHPGTETSTMIPTSTLSLG 

LLETTGLIATSSSAETSTSTLTLTVSPAVSGLSSASITTDKPQTVTSWNTETSPSVTSVGPPEFSRT 

VTGTTMTLIPSEMPTPPKTSHGEGVSPTTILRTTMVEATNLATTGSSPW 

PLTTFGMSTLASESWSRTSYNHRSWISTTSGYNRRYWPATSTPWSTFSPGISTSSIPSSTAATW 

FMVPFn.NFTITNLQYEEDMRHPGSRKFNATERELQGLLKFLFRNSSLEYLYSGCRl,i^ 

SATAVDAICTHRPDPEDLGUDRERLYWEI^>ILTNGIQELGPY1T.DRNSLYWGFTHRSSMPTTST 

PGTSTVDVGTSGTPSSSFSPTTAGPLLMPITXNFITimQYEEDMRRTGSRKFm^ 

LFKNTSVGPLYSGCRLTLLRPEKI)GAATGVDMCniRLDPKSXGLNREQLYWEI^KLT>n>ffi 

PYTLDRNSLYVNGFraQSSVSATSTPGTSTVDLRTSGTPSSLSSPTIMAAGPIXVPFTI^^ 

YGEDMGHPGSRKFlSriTERVIXGLLGFIFKNTSVGPLYSGCRLTSLRSEKDGAATGVDAICIHHI^ 

PKSPGLNRERLYWELSQLTNGIKELGFYTLDRNSLYWGFIHRTSVPTTSTPGTSTVDLGTSGTPF 

SLPSPATAGFIXV1JTI24FTITNLKYEEDMHRPGSRKTT^ 

RLTLLRSEKDGAATGVDAICniRLDPKSPGVDREQLYWEl^QLTNGIKELGPYTLDRN^ 

FTHWIPVFrSSPGTSTVDLGSGTPSSLPSPTSAAGPLLWFIl-NFTITI^QYEEDMHHPGSRKFN 

ERVLQTLVGFMFKNTSVGLLYSGCRLTLLRSEKDGAATGVDAICniRLDPKSPGVDREQLYWE^ 

SQLTNGIKEIXS^PYTLDRNSLYVNGFmWIPVFTSSTPGTSTVDLGSGTFSSLPSFTTAGFLLVPFTL 

]^ITNLKYEED]VfflCPGSRiaOTTERVLQSLIX5PMFKNTSVGPL^^ 

DAICrnna.DPKSPGVDREQLYWELSQLTNGIKELGPYTII)RNSLYVNGFmQTSAPlvm 

VDLGTSGTPSSLPSPTSAGPLLWFILNFnTNLQYEEDMHHPGSRKFNTTERVLQGLLGPMFl^ 

SVGLLYSGCRLTIXRPEKNGAATGMDAICSHRLDFKSFGL>mEQLYWEI^QLTHGIKELGPYTLD 

KNSLYVNGFnmSSVAPTSTPGTSTVDLGTSGTPSSLPSPTTAWIXWFIl.NFlT^^ 

HPGSRKFNTTmiVLQGLIXS^PIJ^SSVGPLYSGCRIJSLRSEKDGAATGVDAIC 
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THHLNPQSPGLDREQLWQI^QMTNGIKELGPYTLDimSLYVNGFTHRSSGLTTSTPWTSTVDL 

GTSGTPSPWSFITAGPLLWFrL>«TITNLQYEEDMHRPGSRKFNATERVLQGLLSPIFKNSSVGP 

LYSGCKLTSLia>EKDGAATGMDAVCLYHPNPKRPGLDREQLYWELSQLTHNITELGPYSLDRDS 

LYVNGFrHQNSVPTTSTPGTSTVWATrGTPSSFPGHTEPGPLLIPFTFNmTbn.HYEE^ 

SRBGfWTERVLQGLLKPLFKNTSVGPLYSGCRLTLLRPEKQEAATGVDTICTHRVDPIG^ 

RLYWELSQLTNSrim,GPYTLDRDSLYVNGFM>WSSVPTTSTPGTSTVHLATSGT^^ 

PLLIPFTLNFrnmHYEENMQHPGSRKFbnTERVLQGLLKPIJ?KSTSVGPL^^ 

GAATGVDAICTLRLDPTGPGLDRERLYWELSQLTNSVTELGPYTLDRDSLYVNGFTHRSSVPTTS 

IPGTSAVHLETSGTPASLPGHTAPGPLLWFmsnETITNLQYEEDMRHPGSRKI^^ 

PUTKSTSVGPLYSGCRLTLLRPEKaiGAATGVDTICimLDPLNPGIX)REQLYWEI^KLTO^ 

PYLLDRGSLYWGFTHR>JFWITSTPGTSTVHLGTSETPSSLPRPIWGPIXVPFTLNraT^ 

AMRHPGSRK3^^JTTERVLQGLLRPLFB□^^•SIGPLYSSCRLTLLRPEKDKAATOVDA^CTHHPDPQSP 

GIJ^^REQLYWEI ^QLTH G^IELGPYTLDRDSLYVDGFmWSPIPTTSTPGTSIV^n:GTSGI^^ 

TATGPIXWFTLNFITimQYEENMGHPGSRiawreSVLQGIiKPLFKSTSVGPL^^ 

PEKI)GVATRVDAICTEiRPDPKIPGUDRQQLYWEIJSQLTHSrreiX}PYTIX)RDSLYVNG^^ 
PTTSSEYSTDVPMAPILQQT 
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